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search  is  carried  out  by  the  procedure  expressed  in  equations  ( I ) and  (2). 
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Let  l(t.j)  be  any  value  of  i for  which  the  above  maximum  is  achieved. 

(2)  x(t)  - l(t+l.x(t+l)) 

The  use  of  a general  theoretical  framework,  with  an  explicit  representation  for  the  solution 
process  grcatlv  simplifies  the  speech  recognition  system.  Equations  (I)  and  (2)  represent  the 
entire  recognition  process.  Despite  its  simplicity  the  system  can.  to  some  degree,  use  knowledge 
from  each  of  the  domains  A.L.P,  and  S. 

A simplified  implementation  of  the  DRAGON  system  has  been  dcvclor  .d  using  knowledge  A 
and  l and  some  or  the  knowledec  from  S This  implementation  has  been  tested  on  102  utterances 
from  5 interactive  computer  tasks.  The  size  of  the  integrated  Markov  network  representing  the 
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lime  is  independent  of  the  amount  of  noise  in  the  signal  or  the  number  of  errors  in  intermediate 
recognition  decisions  The  system  correctly  recognized  49'\i  of  the  utterances  and  correctly 
identified  83%  of  the  578  words 
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STOCHASTIC  MODELING  AS  A MEANS  OF  AUTOMAT  IC  SPEECH  RECOGNITION 

James  K Baker 
Carncgic-Mcllon  University 

Automatic  recognition  of  continuous  speech  involves  estimation  of  a sequence  X(l),  X(2), 
X(3).  , X(T)  which  is  not  directly  observed  (such  as  the  words  of  a spoken  utterance),  based  on 

a sequence  Y(l),  Y(2),  Y(3).  , Y(T)  of  related  observations  (such  as  the  sequence  of  acoustic 

parameter  values)  and  a variety  of  sources  of  knowledge  Formally,  we  wish  to  find  the  sequence 
x|  I T|  which  maximizes  then  posteriori  probability  Pr(  X|  I :T|«x|  I :T)  | Y|  I T)a-yll  T),  A,  l,  P, 
S ).  where  A,  l,  P,  S represent  the  acouslie-phonclic,  lexical,  phonological,  and  syntactic-semantic 
knowledge.  A speech  recognition  system  must  attempt  to  approximate  a solution  to  this  problem, 
whether  or  not  the  system  uses  a formal  stochastic  model. 

The  DRAGON  speech  rceogn.tion  system  models  the  knowledge  sources  as  probabilistic 
functions  of  Markov  processes  i he  assumption  of  the  Markov  property  allows  the  use  of  an 
optimal  search  strategy  The  DRAGON  system  finds  the  sequence  x|  I :TJ  which  maximizes  the 
above  probability,  as  given  by  the  Markov  model.  In  effect,  the  system  searches  all  possible 
sentences  in  the  grammar,  all  possible  pronunciations  of  each  sentence,  and  all  possible  dynamic 
lime  warpings  of  each  such  phonetic  siring  to  best  fit  it  to  the  acoustic  observations.  This  optimal 
search  is  carried  out  by  the  procedure  expressed  in  equations  ( I ) and  (2). 

(1)  y(t,j)  - Max,  \ y(t-l.i)Pr(  X(t)*j  | X ( l — I ) = i.  A,L,P,S  ) 

Pr(  Y(l)  = y(t)  | X(l-I)  = i.  X(l)=j.  A.l.P.S  ) f 

Let  1(1, j)  be  any  value  of  i for  which  the  above  maximum  is  achieved. 

(2)  x(l)  = l(t+l.  x(t+l)) 

The  use  or  a general  theoretical  framework,  with  an  explicit  representation  for  the  solution 
process,  greatly  simplifies  the  speech  recognition  system  Equations  (I)  and  (2)  represent  the 
entire  recognition  process  Despite  its  simplicity  the  system  can,  to  some  degree,  use  knowledge 
from  each  of  the  domains  A,l,P,  and  S. 

A simplified  implementation  of  the  DRAGON  system  has  been  developed  using  knowledge  A 
and  l,  and  some  of  the  knowledge  from  S.  This  implementation  has  been  tested  on  102  utterances 
from  5 interactive  computer  tasks.  The  size  of  the  integrated  Markov  network  representing  the 
knowledge  sources  is  410,  702,  916,  498,  and  2356  stales,  respectively,  for  the  5 tasks  whose 
vocabulary  sizes  are  24,  66,  37,  28.  and  194  words,  respectively,  and  which  have  grammars  of 
varying  degrees  of  complexity.  The  time  required  for  recognition  of  an  utterance  is  proportional  to 
the  length  of  the  utterance  and  is  given  approximately  by  the  expression  (recognition  time)  = (utt 
lcngth)(20.9  + ()67(nct  size))  Since  a complete  optimal  search  is  performed,  the  recognition 
time  is  independent  of  the  amount  of  noise  in  the  signal  or  the  number  of  errors  in  intermediate 
recognition  decisions  The  system  correctly  recognized  49%  of  the  utterances  and  correctly 
identified  83%  of  the  578  words 
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INTRODUCTION 


Speech  recognition,  a task  which  humans  do  efficiently  and  well,  is  very  difficult  to  do  by 
automatic  procedures  There  is  a great  deal  of  ambiguity  in  the  actual  acoustic  signal — ambiguity 
, which  can  be  resolved  only  by  applying  other  sources  of  knowledge  in  addition  to  the  acoustic 

signal((AI|,  |R7|,  |N2|).  In  recent  years  much  research  has  been  devoted  to  developing  the  other 
sources  of  knowledge  that  are  available  in  analyzing  speech  which  is  restricted  to  a specialized 
domain  of  discoursed  R4),  (R5|,  (Tl),  [DIJ,  |P2|,  [W3).  |F2|,  |B6).  |WI],  |M),  |J3|)  In  such  a 
specialized  domain  there  is  generally  a restricted  vocabulary,  so  one  source  of  knowledge  is  the 
lexical  knowledge.  The  utterances  are  constrained  to  be  grammatical  and  sometimes  the  grammar 
is  a special  restricted  one,  so  there  is  syntactic  knowledge  In  some  of  the  systems  the  specialized 
domain  is  an  interactive  task  with  the  computer  as  a participant  Thus  there  is  a ' operationa1 
definition  of  whether  an  utterance  is  "meaningful”  (that  is,  can  the  computer  interpret  the 
utterance  in  relation  to  the  interactive  task),  and  therefore  there  is  a kind  of  semantic 
knowledgc(|R6|) 

In  order  to  apply  these  sources  of  knowledge  in  speech  recognition,  it  is  necessary  to  represent 
this  knowledge  in  a form  that  can  be  compared  with  the  acoustic  observations.  There  are  'wo 
operations  which  arc  essential  in  any  speech  recognition  system  searching  and  matching  Suppose 
one  knowledge  source,  such  as  syntax,  hypothesizes  a word  or  a sequence  of  words.  This  hypothe- 
sis can  only  be  verified  by  matching  the  words  with  the  events  observed  by  the  other  sot  rccs  of 
knowledge,  such  as  the  actual  acoustic  signal.  A matching  procedure  is  needed  to  evaluate  any 
particular  hypothesis  A searching  procedure  is  needed  to  explore  the  space  of  possible  hypothes- 
es. 

SEARCHING  AND  MATCHING  IN  SPEECH  RECOGNITION  SYSTEMS 

f 

The  various  speech  recognition  systems  which  have  been  developed  use  a great  variety  of 
searching  and  matching  procedures  and  employ  them  in  many  different  ways  The  DRAGON 
speech  recognition  system,  the  subject  of  this  thesis,  is  based  on  a systematic  use  of  a particular 
abstract  model  to  represent  many  of  the  sources  of  knowledge  needed  for  speech  recognition  1 his 
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uniformity  of  representation  then  allows  a powerful  general  scarcliing/matching  technique  to  be 
applied  to  the  speech  recognition  system  as  a whole  First  let's  consider  some  of  the  ways  in  which 
searching  and  matching  procedures  are  used  in  other  speech  recognition  systems. 

The  HEARSAY  I system  f|E2|.  |R3|,  |R4|,  |R5|)  employs  a hypothesize  and  test  paradigm 
There  is  a separate  programming  module  for  each  source  of  knowledge  which  is  represented.  Each 
module  is  responsible  for  generating  hypotheses  based  on  its  own  internal  knowledge  En.h 
hypothesis  is  then  verified  by  each  of  the  modules  (that  is,  each  module  matches  the  hypothesis 
against  its  own  knowledge)  and  a combined  rating  is  computed  The  modules  communicate  with 
each  othci  primarily  by  stating  hypotheses  about  the  sequence  of  words  and  each  module  has  its 
own  matching  procedures  for  relating  such  "word-level"  hypotheses  to  its  own  specialized 
knowledge.  The  search  strategy  is  basically  a best-first  tree  search.  Words  arc  hypothesized 
proceeding  left-to-right  in  the  utterance.  At  any  point  in  the  analysis  new  hypotheses  arc 
generated  which  are  extensions  of  the  best  partial  sequence  of  words  obtain  so  far  in  the  analysis. 
On  the  next  round  of  the  analysis,  either  the  best  such  extension  becomes  the  test  partial  sequence 
or,  if  all  such  extensions  get  sufficiently  low  ratings,  a previous  partial  sequence  (which  had  been 
the  second  best  partial  sequence)  is  reactivated. 


In  the  HEARSAY  II  system  ((L2J)  the  matching  and  search  mechanisms  arc  much  more 
general  and  flexible  Hypotheses  are  not  restricted  to  the  word  level,  but  instead  arc  organized 
into  an  indefinite  number  of  levels  ranging  from  sub-phonetic  acoustic  segements  to  semantics  and 
pragmatics.  There  arc  a large  number  of  independent  knowledge  source  modules.  Each  knowl- 
edge source  repeatedly  applies  matching  procedures  to  compare  the  data  structure  of  existing 
hypotheses  with  its  internal  knowledge  base  Whenever  a match  is  found  the  knowledge  source 
takes  the  appropriate  action  to  add  an  hypothesis  or  otherwise  modify  the  data  structure.  The 
search  strategy  consists  of  scheduling  which  knowledge  sources  get  activated  and  in  what  order, 
based  on  a variety  of  score,  and  ratings  for  the  h>pothescs  that  are  in  the  data  structure  at  a given 
time. 


In  the  Automatic  Recognition  of  Continuous  Speech  (ARCS)  systems  (|L)I  |,  |TI  |,  |T2|,  | T3|. 
IHI.  1 1*2 1,  | R 1 1 ) a variety  of  tests  are  applied  to  the  acoustic  signal  to  derive  a (noisy)  phonetic 
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string  and  there  is  a language  model  for  generating  sequences  of  words  The  conversion  of  the 
noisy  phonetic  string  to  an  orthographic  string  is  then  performed  by  searching  and  matching 
procedures.  For  each  word  there  is  a network  representing  all  permitted  pronunciations  of  the 
word.  The  conditional  probability  of  a particular  word  producing  a given  phonetic  string  can  be 
computed  explicitly,  and  is  used  to  measure  the  degree  of  match.  The  search  procedure  is  a 
best-first  tree  search  implemented  by  a sequential  decoding  algorithm  Earlier  versions  of  the 
ARCS  system  had  the  same  general  structure,  but  performed  the  matching  at  the  phonetic  level 
rather  than  at  the  word  level. 

The  knowledge  sources  in  the  SPEF.CHLIS  system  (|B7J,  |NI),  [R9J.  |W21,  |W31)  represent 
their  information  in  lattice  structures  which  show  nil  the  alternatives  at  any  point  in  time.  The 
word-lattice  is  generated  by  matching  each  lexical  item  with  (he  entries  in  the  segment  lattice.  A 
semantic  component  searches  the  word  lattice  to  develop  "theories"  of  semantically  related  words. 
The  semantic  component  continues  to  work  on  the  theories  with  the  greatest  likelihood  scores. 
When  the  semantics  component  can  add  no  more  words  to  a theory,  the.  theory  is  passed  to  a 
syntax  component  which  performs  a parse  and  fills  in  any  gaps 

The  CASPER  system  (|F2|,  |K1J)  performs  a match  between  lexical  items  and  a noisy 
phonetic  sequence  by  using  multiple  dictionary  entries,  phonological  rules  embedded  in  the 
dictionary,  and  a "dcgarbling"  procedure.  The  search  is  controlled  by  an  augmented  context-free 
grammar  which  performs  a left-to-right,  boltom-up  parse. 

The  Vocal  Data  Management  System  (|B6],  |R8|)  developed  at  SDC  employs  a strategy  of 
"Predictive  Linguistic  Constraints."  The  parser  attempts  to  predict  phrases  based  on  a simple  user 
model,  thematic  patterning,  and  grammatical  and  semantic  constraints.  Fixed  directional  parsing  is 
replaced  by  a more  general  approach  so  that  processing  may  be  initiated  at  any  point  in  the 
utterance.  Lexical  items  arc  matched  against  the  acoustic-phonetic  data  by  a word  mapper  and  a 
syllable  mapper.  The  word  mapper  handles  alternate  pronunciations  of  a word,  decides  likely 
times  foi  syllable  boundaries,  and  checks  for  co-articulation  effects  across  syllable  boundaries. 
The  syllable  mapper  compares  a syllable  candidate  with  the  sequence  of  acoustic  parameters. 

The  SRI  Speech  Understanding  System  (|P3|,  |l’4).  |WI|)  uses  a special  "word  function"  for 
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each  item  in  the  lexicon  Each  word  function  consists  of  a series  of  Fortran  subroutines  that  look 
for  a match  between  its  particular  word  and  data  from  a variety  of  sources  based  on  parameters 
extracted  from  the  acoustic  signal  The  parser  executes  a top-down,  "best-first"  strategy  In 
addition  to  its  parsing  function,  it  calls  on  the  other  components  and  coordinates  information 
among  them. 

The  Univac  Speech  Understanding  System  (|LIJ)  uses  a prosodically-guidcd  strategy. 
Prosodic  features  are  used  to  break  sentences  into  phrases,  locate  the  stressed  syllables  within 
those  phrases,  and  guide  procedures  for  both  phone  classification  and  nigher  level  linguistic 
analysis.  This  strategy  requires  a search  procedure  which  is  able  to  initiate  processing  at  any  point 
in  the  utterance  as  indicated  by  the  prosodic  features.  Specific  search  and  matching  procedures 
have  not  yet  been  implemented  for  this  system 

The  speech  recognition  system  be.ng  developed  at  the  IBM  Watson  Research  Center  (|B1). 
I J3 J)  IS  based  on  a linguistic  sequential  decoder.  The  decoder  consists  of  four  major  subparts.  I ) a 
statistical  model  if  the  language.  2)  a phonemic  dictionary  and  statistical  phonological  rules.  3)  a 
phonetic  matching  algorithm.  4)  word  level  search  control.  The  search  procedure  is  a stack 
decoding  algorithm  which  seeks  that  word  sequence  which  has  the  maximum  a posteriori 
probability,  conditional  on  the  language  and  the  observed  acoustic  sequence.  Statistical  matching 
is  done  between  hypothesized  words  and  a noisy  phonetic  siring  obtained  by  acoustical  analyses 

Even  these  greatly  simplified  descriptions  make  it  clear  that  there  is  a great  variety  of  ways  in 
which  searching  matching  strategies  can  be  implemented.  However,  certain  common  features  can 
be  distinguished  Most  of  the  systems  perform  matching  only  at  one  level.  Generally  the  matching 
is  between  lexical  items  and  a noisy  phonetic  string  (ARCS.  SPEHCIILIS,  CASPER.  IBM- 
Watson)  Ihus  h>r  example,  in  these  systems,  words  and  phrases  arc  not  directly  matched  to  the 
acoustics  For  most  of  the  systems,  the  search  is  controlled  primarily  at  the  word  level 
(HEARSAY  I.  ARCS.  SPEECHLIS.  CASPER.  SDC.  SRI.  IBM-Watson).  Only  two  systems 
(ARCS,  IBM-Watson)  have  explicit  statistical  models  from  which  to  derive  matching  scores 

In  addition  to  the  general  purpose  searching/ matching  which  is  usual'y  used  in  transforming  a 
noisy  phonetic  string  to  a word  string,  several  specialized  procedures  are  used  SDC  has  a mapping 
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between  syllables  anil  acoustic  parameters.  SRI  matches  words  directly  with  acoustics  The  early 
ARCS  system  matched  the  language  directly  onto  the  noisy  phonetic  string  The  segment  data  in 
the  SPLECHI  IS  system  is  a lattice  of  alternatives,  so  matching  even  a single  lexical  item  involves  a 

small  latt.ee  search.  Each  of  the  modules  in  the  HEARSAY  systems  meludcs  specialized  matching 
procedures. 

FEATURES  Ol  HIE  DRAGON  SYSTEM 


1 


The  fundamental  idea  behind  the  DRAGON  system  is  that  eaeh  or  the  knowledge  sources  can 
be  represented  by  a single,  general,  abstract  model.  Then  powerful  general  search/match 
algorithms  can  be  employed  without  worrying  about  all  the  special  eharacteristies  of  each  individu- 
al knowledge  source.  These  special  characteristics  arc  not  ignored,  but  they  get  incorporated  into 
the  data  structures  and  not  into  the  searching/matching  procedures.  The  model  which  is  used 
throughout  the  DRAGON  system  is  that  of  a probabilistic  function  of  a Markov  process[B8], 

The  sequence  or  random  variables  Y(l),  Y(2),  Y(3) Y(T)  is  said  to  be  a probabilistic 

function  of  a Markov  process  ir  there  is  a sequence  of  random  variables  X(l).  X(2),  X(3), 

X(D  such  that  the  sequences  of  X s and  Y's  satisfy  equations  (5)  and  (6)  or  Chapter  II.  The 
techniques  for  analyzing  such  a system  arc  described  in  Chapter  II.  The  interpretation  is  that  the 
Y s are  a sequence  of  random  variables  that  we  observe  and  which  depend  probabilistically  on  the 
X s which  we  do  not  observe.  We  wish  to  make  inferences  about  the  values  of  the  X’s  from  the 
observed  values  ol  the  Y s.  Chapter  III  describes  how  the  knowledge  sources  in  a speech  recogni- 
tion system  can  be  ^presented  in  terms  of  this  type  of  model.  Chapter  IV  describes  a simplified 
implementation  of  these  ideas.  Performance  results  are  given  wlueh  show  that  even  this  greatly 
simplified  implementation  is  a complete  and  powerful  speech  recognition  system 


The  important  features  of  the  DRAGON  system  arc: 

1)  Generative  form  of  model. 

2)  Hierarchical  arran,  ement  of  knowledge  sources; 

3)  Integrated  network  representation. 


J. 
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4)  General  theoretical  framework; 

5)  Optimal  stochastic  search 

In  comparing  the  features  of  different  speech  recognition  systems,  attention  is  often  focused  on  the 
control  structures  and  the  methods  or  communication  among  the  knowledge  source  modules.  Thus 
a system  might  be  characterized  by  whether  the  analysis  proceeds  top-down  or  boltoni-up  (or 
some  mixture),  whether  there  is  a best-first  tree  search  or  some  other  control  mechanism,  and 
whether  the  analysis  proceeds  in  a strict  lefl-to-rir,h.  fashion  or  can  start  at  any  point  in  the 
utterance.  For  several  reasons,  the  DRAGON  system  cannot  be  easily  characterized  by  these 
conventional  dichotomies,  so  the  discussion  of  them  is  postponed  until  the  major  features  of  the 
system  arc  described 

( I ) Generative  form  of  the  model 

nie  generative  form  is  a naluul  one  for  a probabilistic  function  of  a Markov  process 
Generative  rules  are  formulated  as  conditional  probabilities  For  example,  if  we  know  which 
phone  occurs  at  a given  lime,  vocal  tract  models  allow  us  to  predict  the  values  of  the  acoustic 
parameters.  That  is.  a conditional  probability  distribution  is  defined  in  acoustic  parameter  space. 
If  we  know  which  word  occurs  during  a given  segment  of  time,  phonological  rules  allow  us  to 
estimate  the  probability  of  various  phone  sequences  representing  different  pronunciations  of  the 
word  A statistical  model  lor  the  errors  of  an  automatic  phone  classifier  allows  us  to  calculate  the 
probability  of  the  classifier  producing  a specific  sequence  of  labels,  conditional  on  the  true 
sequence  of  phones  being  a particular  phone  sequence  The  grammar  for  a specific  task  domain 
produces  a conditional  probability  distribution  in  the  space  of  word  sequences  such  that  ungram- 
matical sequences  have  zero  probability 

Lach  of  the  knowledge  sources  in  the  DRAGON  system  is  represented  in  a generative  form  as 
a probabilistic  function  of  a Markov  process.  However.  Bayes'  theorem  allows  the  computation  to 
he  performed  analytically  The  model  (ells  the  conditional  probability  of  producing  a specific 
sequence  ol  acoustic  parameter  values  Irom  a specific  sequence  of  words.  Applying  Mayes' 
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theorem,  we  can  compute  the  a posteriori  p'obability  of  a sequence  of  words  from  the  observed 

| 

sequence  of  acoustic  parameter  values. 

. 

(2)  Hierarchical  arrangement  of  knowledge  sources 

l 

The  sources  of  knowledge  arc  organized  into  a hierarchy  based  on  the  following  observation: 
The  "higher"  levels  of  a speech  recognition  system  change  stale  less  frequently  than  the  "lower" 
levels.  Thus  a single  syntactic-semantic  state  corresponds  to  a sequence  of  several  words;  a single 
word  corresponds  to  a sequence  of  several  phones;  and  a phone  corresponds  to  a sequence  of 
acoustic  parameter  values.  The  hierarchy  is  not  absolute— for  example,  syntax  and  semantics  arc 
together  a single  multi-level  process — but  it  provides  a convenient  means  for  combining  the 
Markov  processes  which  represent  the  individual  sources  of  knowledge. 

To  sec  how  the  knowledge  can  be  represented  as  a hierarchy  of  generative  models,  let’s 
consider  a simplified  example.  Consider  a language  with  only  two  sentences:  "What  did  you  see?" 

and  Where  did  you  go?"  At  the  word  level  this  language  can  be  represented  by  the  network 
shown  in  Figure  I . 

GRAMMAR  NETWORK 
where  — did  — ■*  you  — •*  go 
what  ► did  — •*  you  — •*  sec 

FIGURE  I 

This  model  is  generative  in  the  sense  that  if  we  know  a partial  sequence  of  words  (e.g.  "What  did") 
the  model  tells  exactly  which  word  can  come  next  ("you").  But  we  do  not  directly  observe  the 
words  (we  only  observe  the  associated  acoustic  events),  so  we  must  compute  the  a posteriori 
probability  of  any  word  sequence  using  the  techniques  of  Chapter  II. 
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WORD  NETWORK 


- /w/  — - /A  / * /t/ 


FIGURE  2 


In  the  next  lower  level  of  the  hierarchy  we  represent  the  relationship  between  the  words  and 
the  phones.  To  keep  the  network  simple,  only  a single  pronuneialion  is  represented  for  each  word. 
For  example,  the  network  for  what"  is  shown  in  Figure  2.  It  is  also  possible  to  add  another  level 
to  the  hierarchy  connecting  the  phones  to  the  expected  acoustic  parameter  values.  The  stop 
consonants  and  the  dipthongs  arc  broken  up  into  several  sub-phoncmic  segments.  Tne  network 
for  (lh)  is  shown  in  Figure  3.  The  connection  with  acoustic  parameters  is  then  represented  by  a 
table  giving  the  statistical  distribution  of  parameter  values  for  each  type  of  segment.  Phonological 
and  acoustic-phonetic  rules,  which  are  omitted  from  this  example,  could  be  represented  either  at 


the  broad  phonetic  level  (such  as,  if  the  /t / is  flapped)  or  at  the  acoustic  segment  level  (whether 
the  /l / is  released  and  its  degree  of  asprialion.  if  released). 

PHONE  NETWORK 


— - l 


(where  - represents  the  pause  portion,  and  lh  represents  the  release /aspiration) 


FIGURE  3 


The  nodes  in  I igurc  3 have  arcs  which  point  back  to  themselves  because  we  arc  representing 
two  processes  which  are  asynchronous  with  respect  to  each  other.  That  is.  the  acoustic  parameters 
arc  measured  at  fixed  lime  intervals  (say  once  every  10  milliseconds),  but  each  sub-phoncmic 
acoustic  segment  lasis  lor  mi  unknown  period  of  lime.  So.  if  we  time  our  stochastic  process  at  one 
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step  every  10  milliseconds,  then  the  process  may  slay  in  ihc  same  stale  for  several  units  of  lime,  as 
indicated  by  an  are  returning  lo  the  same  node.  A phone  which  consists  of  a single  acoustic 
segment  is  represented  be  a phone  network  with  a single  node,  but  with  a loop  from  the  node  back 
to  itself,  again  indicating  that  the  process  may  stay  in  this  slate  for  several  units  of  time. 


(3)  Integrated  network  representation 


fo  describe  a point  in  the  hierarchical  state  space,  we  must  describe  its  position  in  a network 
at  each  level  of  the  hierarchy.  For  example,  the  description  (I)  "the  pause  segment"  of  (2)  "the 
[t  | ' of  (3)  "the  word  ’what’,"  dcscibcs  a particular  point  in  the  hierarchical  state  space  in  our 
simple  example.  Since  each  of  the  networks  is  finite,  it  is  possible  to  define  a new  network  with  a 
separate  node  for  each  point  in  the  hierarchical  space.  In  terms  of  the  knowledge  represented,  this 
new  network  and  the  hierarchy  of  networks  arc  equivalent.  The  change  is  primarily  one  of 
convenience.  1 he  integrated  network  representing  our  simplified  example  is  shown  in  Figure  4. 

INTEGRATED  NETWORK 


FIGURE  4 

Actually  it  is  possible  to  represent  more  knowledge  in  the  integrated  network  than  in  the 
hierarchical  system.  For  example,  phonological  rules  which  apply  across  word  boundaries  (such  as 
the  palatalization  in  the  word  pair  did  you")  may  be  used  to  make  modifications  to  the  network 
Note  that  the  integrated  network,  because  it  is  derived  in  a special  way  from  a hierarchy,  is  very 


T 
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sparse  In  the  example  each  node  (except  the  end  nodes)  is  connected  to  (has  an  arc  pointed 
toward)  only  ilscIT  and  one  other  node.  Even  with  a more  general  language  and  networks 
representing  phonological  rules,  almost  any  node  that  is  not  adjacent  to  a word  boundary  would  be 
connected  omy  to  ilscir  and  one,  two.  or  three  other  nodes.  Thus,  in  a network  with  thousands  of 
nodes,  there  arc  only  two  or  three  arcs  per  node  (instead  of  the  thousands  which  would  be 
possible).  This  properly  of  sparscncss  has  implications  for  the  implementation  of  the  speech 


recognition  system,  as  is  discussed  in  Chapters  II  and  IV. 

The  size  of  the  integrated  network  for  a given  task  depends  on  the  vocabulary  size,  the 
complexity  of  the  grammar,  and  on  some  of  the  details  of  the  implementation.  The  five  tasks 
discussed  in  Chapter  IV  have  vocabula.y  sizes  of  24.  66.  37.  28.  and  194  words,  respectively.  The 
number  of  nodes  in  the  integrated  network  is  410,  702,  916.  498.  and  2J$i,  respectively.  Even 
the  largest  network  is  small  enough  so  that  the  recognition  system  described  in  Chapter  IV  can 
keep  all  or  its  intermediate  computational  results  in  the  computer’s  core  memory  with  no  need  to 
use  secondary  storage. 

Note  that  we  go  from  a group  of  separate  knowledge  sources  to  an  integrated  network 
representation  in  essentially  three  steps.  First,  each  knowledge  source  is  represented  as  a probabil- 
istic function  or  a Markov  process.  The  details  of  this  step  arc  described  in  Chapter  III.  In  this 
chapter  the  skeleton  or  the  idea  is  exposed  by  way  or  the  associated  network.  Second,  the 
knowledge  sources  arc  arranged  in  a hierarchy.  In  a sense,  it  is  this  step  which  is  crucial.  It  relies 
on  the  special  relationships  among  the  knowledge  sources  Tor  speech  recognition  systems.  It  would 
not  necessarily  be  applicable  to  knowledge  sources  Tor  other  problems  even  ir  the  knowledge 
sources  arc  representable  as  probabilistic  functions  or  a Markov  process.  Third,  the  hierarchy  of 
networks  is  converted  into  an  equivalent  single  network  (and  the  hierarchy  of  Markov  processes  is 
replaced  by  a single  Markov  process).  Alhough  this  final  step  changes  the  apparent  external 
structure  of  the  system,  it  docs  not  change  the  substance. 

(4)  General  theoretical  framework 

As  stated  before,  the  DRAGON  system  relies  throughout  on  a particular  abstract  model that 

of  a probabilistic  junction  of  a Markov  process.  A sequence  of  random  variables  Y(  I ).  Y(2), 
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Y(3)....  , Y(T)  is  said  to  be  a probabilistic  function  of  the  Markov  process  X(l).  X(2),  X(3), 

X(T)  if  these  random  sequences  satisfy  equations  (5)  ar.d  (6)  of  Chapter  II.  These  equations  may 

be  paraphrased  as  requiring  that,  for  any  t.  X(t)  depends  only  on  X(t-  I ) and  Y(.)  depends  only  on 

X(l)  nd  X(t-I).  Chapter  III  describes  how  various  knowledge  sources  may  be  represented  by 
such  a model. 


The  formulas  lhat  Ihc  model  produces  are  similar  lo  Ihc  formulas  used  ,n  oilier  statistically 
based  speech  reeognitmn  systems  (ARCS  and  IBM-Wats-m).  certain  ways,  either  system  can 
be  considered  as  a special  ease  of  the  other.  The  d.fferenee  is  more  one  of  emphasis  than  one  of 
k.nd  The  emphasis  in  the  DRACON  system  is  one  of  representing  each  of  the  knowledge  sources 
in  a u tiform  theoretical  framework.  Thus  specialized  procedures  [o,  handling  the  data  for  a 
particular  knowledge  source  arc  avoided. 


The  only  specialized  procedure  are  those  used  in  setting  up  the  integrated  network  to 
represent  the  combined  knowledge  sources.  In  recognizing  a particular  utterance,  the  only 
procedure  which  is  used  is  one  which  is  based  only  on  the  general  properties  of  a probabilistic 
function  of  a Markov  process  For  example,  the  type  of  specialized  procedure  which  is  absent  is 
one  which  would  take  acoustic  parameters  and  with  a complicated  set  of  rules,  thresholds,  and 
decisions  produce  a raw  phonetic  string  intended  to  be  as  close  as  possible  to  a phonetic  transcrip- 
ts of  the  utterance.  As  explained  in  Chapter  III.  if  such  a procedure  is  available,  the  DRAGON 
system  can  use  the  phonetic  str  ng  which  is  produced.  But  on  the  other  hand,  if  such  a procedure  is 


not  used,  the  DRAGON  system  can  operate  directly  on  the  acoustic  parameters,  since  the 
acoustic-phonetic  knowledge  can  be  represented  as  a probabilistic  function  of  a Markov  process 
and  be  incorporated  into  the  hierarchy. 


(5)  Optimal  stochastic  search 


Thu  Markov  model  used  in  (he  DRAGON  sys.em  requires  a fini.e  mate  spaee.  In  (hat  sense  ,( 
is  less  general  than  (he  augmented  network  systems  (SPEGCHLIS.  CASPER,  SRI)  and  stack 
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decoding  statistical  systems  (ARCS.  IBM-Watson).  However,  a large  finite  network  can  represent 
most  of  the  important  information  anJ  some  of  the  things  which  it  cannot  represent  are  irrelevant 
in  a recognition  problem  in  which  the  input  is  a ncisy  p.ionctic  string  with  arbitrary  insertions  and 
deletions  The  finite  state  space  and  the  Markov  model  make  possible  the  powerful  algorithms 
which  arc  described  in  Chapter  II. 

The  search  algorithm  of  the  DRAGON  system  is  unique  in  that  rather  than  search  a tree  (the 
tree  of  possible  word  sequences)  one  branch  at  a time  in  some  best-first  or  depth-first  manner,  it 
searches  the  entire  space  of  all  possible  paths  through  its  network.  All  paths  of  a given  length  arc. 
in  effect,  searched  in  parallel  At  the  end  of  the  analysis  a path  is  obtained  which  is  an  optimum 
over  all  possible  paths  through  the  network.  This  path  represents  that  interpretation  of  an 

utterance  which,  among  all  possible  interpretations,  best  matches  the  given  observed  values  of  the 
acoustic  parameters. 

To  search  this  entire  space  may  seem  to  be  drastic,  but  with  the  Markov  model  and  the 
algorithms  of  Chapter  II.  it  can  be  done  very  efficiently.  These  algorithms  arc  not  new.  The 
inductive  computation  of  the  best  partial  sequence,  as  done  by  equation  ( IK)  of  Chapter  II.  is  an 
application  of  dynamic  programming  to  the  general  network  search  problem(|B9J).  It  corresponds 
to  an  algorithm  used  in  communications  and  coding  theory,  known  as  the  Viterbi  algorilhm(|  VI  |) 
There  arc  other  algorithms  for  sequential  dccoding(|FI  |.  |JI|.  |J2|).  which  arc  also  based  on 
maximizing  the  o posteriori  probability  according  to  such  a stochastic  model,  and  several  of  them 
have  been  successfully  applied  to  speech  recognition  (ARCS  and  IBM-Watson). 

The  number  of  computations  required  to  search  the  space  of  all  possible  paths  through  the 
network  is  proportional  to  (the  length  of  the  utterance)  limes  (the  number  of  arcs  in  the  network). 
For  a given  network,  the  compulation  lime  is  linear  in  the  length  of  the  utterance  and  is  independ- 
ent >f  the  amount  of  noise  or  the  number  of  errors  in  any  input  string.  Ibis  property  is  in  -.harp 
contrast  to  depth-first  or  best-first  algorithms  for  which  there  is  no  effective  upper  bound  for  the 
amount  of  computation  (except  a scaich  of  the  entire  tree,  one  branch  at  a time).  The  sequential 
search  algorithms  do.  in  fact,  occasionally  need  to  be  terminated  before  completion  of  the  analysis 
because  they  exhaust  the  available  lime  or  storage. 
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On  the  other  hand,  although  the  Markov  model  permits  a complete  optimum  search  in  a time 
that  is  linear  in  the  length  of  the  utterance,  the  proportionality  factor  is  large,  especially  for  large 
vocabularies.  Many  things  could  he  done  to  reduce  the  computation  time  required  by  the 
DRAGON  system,  and  they  are  an  important  and  interesting  area  for  future  research,  but  in  the 
work  reported  in  this  thesis  there  has  been  no  attempt  to  minimize  the  compulation  time.  Lowerrc 
UL3|)  has  rewritten  the  DRAGON  program  to  execute  much  faster  with  no  change  in  recognition 
results.  The  computation  times  given  in  Chapter  IV.  therefore,  should  be  regarded  as  an  upper 
bound  on  the  amount  of  time  req.-red  by  the  techniques  presented  in  this  thesis  and  as  a demon- 
stration that  complete  optimal  search  is  not  impossible 


The  DRAGON  system  cannot  be  characterized  as  either  top-down  or  bottom-up  because  it 
has  aspects  of  both  types  of  system.  The  models  are  given  in  a generative  form,  which  is  normal 
for  top-down  systems  However,  by  applying  Bayes'  formula  the  analysis  proceeds  in  the  analytic 
rather  than  the  synthetic  direction.  But  even  more  significant  is  the  fact  that  the  integrated 
representation  makes  it  impossible  to  distinguish  whether  the  acoustic  knowledge  is  helping  to 
Jircet  the  syntactic  analysis,  or  if  the  syntactic  knowledge  is  helping  to  direct  the  acoustic  analysis. 
Instead  of  a system  with  separate  components  with  specific  feed-back  and  feed-forward  mecha- 
nisms for  transmitting  information,  the  system  is  completely  integrated. 


I he  DRAGON  system  represents  an  extreme  position  in  terms  of  its  search  strategy.  Most 
systems  use  some  form  of  best-first  tree  search  with  procedures  for  backtracking  when  the  analysis 
squires  it.  By  contrast,  the  DRAGON  system  uses  a complete  optimal  search,  which  would  be  like 

a breadth-first  tree  search  except  the  Markov  model  reduces  the  tree  search  to  a much  smaller 
network  search 


The  particular  implementation  which  is  discussed  in  Chapter  IV  ,s  restricted  to  a strict 
left-to-right  analysis,  and  the  formulas  in  Chapters  II  and  III  have  been  expressed  in  that  form  It 


would  be  possible  to  generalize  this  system  -o  have  the  analysis  proceed  from  any  point  ,n  the 
utterance,  but  because  there  is  already  a complete  optimal  search,  there  is  no  advantage  in  doing 


so.  It  is  not  necessary  to  start  the  analysis  at  "islands  of  reliability"  because  any  path  which  gives 
Ihc  correct  interpretation  of  such  an  island  is  eventually  considered  in  the  optimal  search  (unlike  a 
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bcst-firsl  search  in  which  analyzing  unreliable  data  first  can  cause  the  correct  interpretation  of 
later  reliable  data  never  to  be  considered)  Because  the  computation  time  is  a linear  function  of 

the  length  of  the  utterance  there  is  no  computational  advantage  in  breaking  the  utterance  into 
several  pieces. 

The  remainder  of  this  thesis  is  divided  into  three  chapters.  Chapter  II  describes  the  abstract 
model  which  is  used  in  the  DRAGON  system.  In  the  DRAGON  system  each  source  of  knowledge 
is  represented  as  a probabilistic  function  of  a Markov  >roccss(jB8J).  Chapter  II  presents  the 
general  mathematical  properties  for  such  systems,  but  omits  the  details  which  are  specific  to  speech 
recognition.  Chapter  III  presents  techniques  for  representing  the  knowledge  sources  necessary  for 
speech  ^cognition  Sometimes  several  alternative  techniques  are  described  for  representing  a 
particular  source  of  knowledge.  Some  of  the  representation  techniques  described  in  Chapter  III 
arc  used  in  the  simple  implementation  discussed  in  Chapter  IV.  Some  of  the  other  techniques  have 
been  tested  in  separate  modules  bu«  not  in  a complete  recognition  system.  Some  or  the  techniques 
have  not  yet  been  tested.  In  particular,  no  attempt  has  been  made  to  represent  a semantic 
component  or  even  to  obtain  a weighted  probabilistic  grammar.  Chapter  IV  describes  a speech 
recognition  system,  based  on  the  general  model  of  Chapter  II.  obtained  by  implementing  some  «>r 
the  representation  techniques  presented  in  Chapter  III.  A summary  is  presented  or  recognition 
results  for  102  utterances.  The  system  torrecily  recognized  49%  of  the  102  utterances  and 
correctly  identified  83%  of  the  578  words. 


Chapter  II  — GENERAL  MODEL 


Page  15 


INTRODUCTION 

The  DRAGON  speech  recognition  system  utilizes  the  theory  of  a probabilistic  function  of  a 
Markov  process  In  this  chapter  an  introduction  is  given  to  the  general  theory  Chap'v  III 
explains  how  the  knowledge  sources  in  a speech  recognition  system  can  be  represented 

Let  Y(l).  ' (2).  Y(3) Y(T)  be  a sequence  of  random  variables  representing  the  external 

(acoustic)  observations.  Let  X(l).  X(2).  X(3) X(T)  be  a sequence  or  random  variables 

representing  the  internal  states  of  a stochastic  process  such  that  the  probability  distributions  of  the 
Y’s  depend  on  the  values  of  the  X's.  but  the  X's  are  not  directly  observed  As  a convenient 
abbreviation  we  use  a bracket  and  colon  notation  to  represent  sequences.  Thus.  Y|I.Tj  represents 

Y(  I )•  Y(2)-  Y(3) Y(T)  and  X(I:T|  represents  X(l).  X(2).  X(3) X(T).  Let  y|I  .TJ  be  the 

observed  sequence  of  values  for  the  random  variables  Y|  1 TJ. 

GENERAL  FORMULATION 

We  wish  to  make  inferences  about  the  sequence  X[I:TJ  in  light  of  the  knowledge  of  y(l:T|. 
For  example,  we  would  like  to  know  the  conditional  probability  PROB(  X(l)=j  | Y(  I TJ=  ( I TJ  ) 
for  each  t and  j (the  conditional  probability  of  a specific  internal  slate  at  a specific  lime,  given  the 
entire  sequence  of  external  observations).  Assuming  we  have  a model  for  speech  production,  we 
can  evaluate  the  a pnon  probability  PROB(  X|I:TJ  ).  Assuming  a model  for  the  generation  of 
acoustic  events  associated  with  a specific  sequence  of  internal  states,  we  can  evaluate  the  condi- 
tional probability  PROB(  Y(  I :t  [-y(  I TJ  | X|  I :T)=x|  I :TJ  ) (That  is.  the  model  yields  conditional 
probabilities  or  external  observations,  given  the  sequence  of  internal  states).  Thus  we  know  the 
conditional  probabilities  in  the  generative  or  synthetic  form. 

We  can  compute  the  desired  conditional  probabilities  using  Bayes'  formula 
(DPROB(X(t)-j  | Y|  1 TJ=y|  I TJ  ) 

- PROB(  X(t)=j.  Y(  I :TJ=y(  I :T|  )/PROB(  YJ I TJ-y|  I T)  ) 
if  we  can  evaluate  the  factors  on  the  right  hand  side.  The  numerator  is  given  by 


i 
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(2)  PROB(  X(t)-j.  Y|l:T|-y|l:T|  ) 

“ 2,f.  ti^„.,PROB{  X|  I T]-x|  I T|.  Y[  I :Tj-y|  I :T|  ) 

" ^«h  tuii.iPR°B(  V’i  l:T)*y|  l:T|  | X|  I :Tj-x|  I :TJ  )PROB(  X|  I :T)-x|  I ;T) ) 

where  ihe  sum  is  taken  over  all  possible  sequences  x|l:T|  subject  to  the  restriction  x(t)«j  (The 
joint  probability  of  an  internal  sequence  and  an  external  sequence  is  the  product  of  the  a priori 
probability  of  the  internal  sequence  and  the  conditonal  probability  of  the  external  sequence  given 
by  the  model  The  probability  for  the  event  X(t)«j  is  obtained  by  summing  over  all  internal 
sequences  which  meet  that  restriction.)  We  can  evaluate  the  a prion  probability  that  Y|I:TJ 
would  be  y|  I . T|  as 

(3)  PROB(  Y(  I :T|=y|  I T|  ) 

m^.|iT|PROB(  VI1  T|*y|l:T|  | X|  I :T|  = x|  I T|  )PROB(  X|  I :T|=x|  I :Tj  ) 

where  the  the  sum  is  taken  over  all  possible  sequences  x|  I :T|.  (The  total  probability  or  an  external 
sequence  is  the  sum  of  its  joint  probability  with  all  possible  internal  sequences.) 

Therefore 


(4)  PROB(  X(t)  = j | Y|  I T|=y|  I :T|  ) 


* PROB(  X(l)  = j.  Y|  I :T|=y|  I :T|  )/PROB(  Y|  1 :TJ=y|  I :T|  ) 

-.ii  r,..o-,pR()ll(  V|  I T|=y|  l:T|  | X|  I T J-=x|  I :TJ  )PROB(  X|  I :T|  = x|  I :T|  ) 

-.ii  r|PROB(  Y|  I 1|=y|l  T|  | X|  I T|=x|  I :TJ  )PROB(  X|  I :T|=x|  I :T)  ) 

where  the  sum  m the  denominator  ,s  taken  over  all  sequences  x|  l:T|  and  the  sum  in  the  numerator 
is  taken  over  all  such  sequences  subject  to  the  restriction  x(t)=>j.  (This  is  the  probability  of  the 
internal  event  X(t)=j  conditional  on  the  observed  external  sequence,  as  desired.) 


The  derivation  of  equation  (4)  is  just  a standard  application  of  Bayes'  theorem.  It  represents  a 
formal  inversion  of  the  conditional  probabilities  from  the  generative  form  to  the  analytic  form 
(Note  I he  word  analytic  is  used  here  in  a special  sense.  "Analytic"  means  "taking  apart”  as 
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opposed  lo  "synthetic,"  "generative,"  or  "putting  together."  In  terms  of  our  model,  the  generative 
form  predicts  the  obscivations  (Y’s)  in  terms  of  the  internal  sequence  (X’s).  The  analytic  form 
computes  the  a posteriori  probability  of  the  X’s  conditional  on  the  observed  Y’s.)  The  speech- 
recognition  knowledge  sources  provide  the  conditional  probabilities  in  a generative  form  They 
must  be  converted  into  an  analytic  form  to  make  inferences  about  a particular  utterance  from  the 
observed  acoustics.  However,  the  formal  inversion  formula  given  in  equation  (4)  is  not  computa- 
tionally practical  since  in  general  the  set  of  all  possible  sequences  x[l  :T]  is  prohibitively  large.  It  is 

necessary  to  apply  the  restrictions  of  a more  specific  model  lo  obtain  a computationally  efficient 
formula. 

MARKOV  MODEL 

The  DRAGON  speech  recognition  system  assumes  that  the  sequences  represent  a probabilistic 
function  of  a Markov  process[B8].  Specifically,  it  is  assumed  that  the  conditional  probability  that 
X(t)=j  given  X(t-I)  is  independent  of  t and  of  the  values  of  X[l:t-2)  and  that  the  conditional 
probability  that  Y(t)=k  giver  X(t)  and  X(t-I ) is  independent  of  t and  of  the  values  of  any  of  the 
other  X s and  Y’s.  Let  B = { b( } k J and  A = { aSJ  } be  arrays  such  that 

(5)  PROB(  Y(t)=y(t)  | X[  1 :tj  = x(!  :tj,  Y(  I :t- 1 )=y[  I :i-  |j) 

■ PROB(  Y(l)=y(t)  I X(l-I)  = x(t-1),  X(l)=x(l) ) 


(6)  PROB(  X(t)=x(l)  | X|  I :t—  I |=x|  I :i-  I J ) 

- PROB(  X(l)=x(t)  | X(l—  I ) = x(t—  I)  ) 

" a*(«-l).K(l> 

This  restriction  to  a Markov  model  is  the  fundamental  assumption  which  allows  the  DRAGON 
system  to  be  practical.  In  the  Markov  model  the  conditional  proabilities  depend  only  on  X(t)  and 
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X ( l — I ) and  not  on  the  entire  sequence  X|I:T|  as  in  equations  (I)  to  (4),  This  specialization 

makes  it  possible  to  evaluate  the  desired  conditional  probabilities  by  an  indirect  but  computational- 
ly efficient  procedure 

The  Markov  assumption  might  be  paraphrased  by  saying  that  the  conditional  probabilities  are 
independent  of  context,  but  such  a simple  statement  would  be  misleading.  Since  the  stale  space  of 
the  Markov  process  for  our  speech  recognition  application  has  not  yet  been  formulated,  the 
assumption  of  the  Markov  properties  should  be  regarded  as  a prescription  to  be  followed  in  the 
formulation  of  the  state  space  Specifically,  two  situations  which  differ  in  "relevant"  context  must 
be  assigned  two  separate  states  in  the  slate  space  of  the  random  variables  X|I:T|.  Then  all 
"relevant"  context  is  included  in  the  state  space  description,  and  the  conditional  probabilities  are 
indeed  independent  of  further  context.  The  fundamental  assumption  of  the  DRAGON  system  is 
that  it  is  possible  to  meet  this  prescription  and  still  have  a state  space  of  manageable  size. 

Under  the  assumptions  of  equations  (5)  and  (6)  we  have 

C)  PROB(X|l  s|«x|l:s|)  = PROB(X(l)=x(l))(ll1.,Aa.„.,M,0). 

(The  a prion  probability  of  a given  internal  state  sequence  is  the  product  of  the  transition 
probabilities  for  all  the  transitions  in  the  sequence.)  To  simplify,  add  a special  extra  stale  to  the 
Markov  process:  let  x(0)  be  this  special  slate  and  define  a,,,,,  , = PROBf  X(  I )— j ).  Similar 
conventions  are  assumed  throughout  • his  thesis,  unless  specifically  mentioned  otherwise.  Then 

(8)  PROBf  X|l:s|=x|l:s|)-ll1.|saol.I,,1|| 

Also 


(9)  PROBf  Y|  I s|  = y|  I s|  | X|  I :s|=x|  I :s|  , - II,,. A,,-„ .... 

(the  model-defined  probability  of  an  external  sequence,  conditional  on  the  internal  sequence) 
where  bMOl)  k is  defined  appropriately  Combining  (8)  and  (9)  yields 


( 10)  PROBf  X|  I s|=x|  I s|.  Y|  I s|=y|  I :s|  ) 


II 


l«l,S  Ml  - I I. *111*11 


Imiimi:* 
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(the  joint  probability  of  an  internal  sequence  and  an  external  sequence  as  given  by  the  Markov 
model). 

To  make  possible  the  efficient  computation  of  the  sums  in  equations  (3)  and  (4),  we  introduce 
the  probabilities  of  partial  sequences  of  states  and  observations  ([B8|).  Using  (2)  with  t«T«s  and 
using  (10),  we  can  set 

(11)  a(s,x(s))  m PROD(  X(s)«x(s),Y(l:sJ«y[l:s|) 

where  the  sum  is  over  all  possible  sequences  x[l:s- 1 1.  (This  is  the  joint  probability  of  the  partial 
external  sequence,  up  to  time  s.  and  the  event  that  the  process  is  in  state  x(s)  at  time  s.)  Let 

(12)  0(s,x(s))  - PROB(  X(s)«x(s),  Y[s+ 1 :T]«y[s+ 1 :T] ) 

“ «(»♦  I 1 1 1 1.KO.rOI 

where  the  sum  is  over  all  possible  sequences  xjs+ 1 :TJ.  (This  is  the  joint  probability  of  the  partial 
external  sequence  from  time  s+ 1 to  the  end,  and  the  event  that  the  process  is  in  state  x(s)  at  time 
s.)  The  benefit  of  introducing  the  functions  a and  P is  that  the  values  of  a(s,j)  for  a given  s can  be 
computed  from  the  values  of  a(s-l.j).  Similarly,  P for  a given  s can  be  computed  from  the  values 
of  p fors+l. 

RECOGNITION  EQUATIONS 
In  fact 

(13)  o(s,j)  - V<s-l.i)ajAi.y(»> 

(because  every  scoucncc  x(’.  :s|  must  have  x(s- 1 )«i  for  some  i) 
and 

(14)  0(s,j)  ■ 2(0(s+ 1 ,i)a(jbj(y(f+(I 


But  rr(T.j)  - PROB(  X(T)-j,  Y[  I :T|-y[  I :Tj ) hence 
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(15)  PROB(  Y|l:T|«y|I.T|  ) = ^nCl.j). 

We  can  compute  the  conditional  probability  distribution  for  X(t) 

(16)  PROB(  X(t)  = j | Y|l:T|-y|l:T|) 

= PROB(  X(t)-j.  Y|  I :T|-y|  I :T|  )/PR0B(  Y|  I :T|-y|  I :T|  ) 

- o(t.j)0(l.j)/— ,n(T,i) 

In  speech  recognition  problems,  we  usually  want  to  know  the  particular  sequence  x|  I TJ  which 
maximizes  the  joint  probability  PROB(  X|  I :T|  = x|  I :T|,  Y|  I :T|«y|  | :T|  ) Again,  the  problem  can 
be  solved  by  induction  from  partial  sequences  (|B9|).  Let 

(17)  y(i.j)  - Max,,,  ,_„PROB(  X|  I :t-  I | = x|  I : t — 1 1.  X(t)  = j.  Y|  I :t|-y|  I :t|  ) 

Then  y may  be  computed  by 

(IK)  y(l.j)  « Max,y(t- l,i)aMb1)>m . 

Nonce  that  equation  (IK)  is  just  like  equation  (13)  except  that  Max  has  been  substituted  Tor  1'.  It 
is  convenient  to  save  back-pointers"  while  computing  y.  Therefore,  let  l(l,j)  be  any  value  of  i for 
which  the  maximum  is  achieved  in  equation  (IK).  Then  a sequence  x|l:T|  for  which 
PROB(  X|  I T|=x|  I T |.  Y|  I :T|=y|  I :T|  ) is  maximized  is  obtained  by 

( 19)  x(T)  ■ j.  where  j is  any  index  such  that  y(T,j)  = Max.y(T.i) 

and 


(20)  x(t)  « l(t+ l.x(t+l)),  t * T-I.T-2 2,1. 


So  far  the  analysis  lias  assumed  that  the  matrices  A and  If  arc  fixed  and  known.  However,  if 
A and  B arc  not  known  but  must  be  estimated,  then  the  n anil  /f  computed  above  may  be  used  to 
obtain  a Bayesian  a posteriori  re-estimation  of  A and  B.  The  matrix  A is  re-estimated  by 
* “i-i  r_|PROB(  X(l)  = i,  X(l+  I )=j  | V|  I ;T|=y|  I T|.  fa.  ,1.1b, ) 

I ) il:  : = - - - 


^.PROBt  X(t)=i  | Y|l:T|-y|l:T|.  lat||.)bl(  J ) 
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2 1- i.t- )a,  Jbi  Jy(l+ , ,0(  • + 1 ,j) 

The  matrix  B is  re-estimated  by 

. _ A 2i-i.T-i;y(t+n-kPR0B<  X(t)»i,  X(t+l)-j  | Y(l;t]*y[l:T),  |au}.  (b  ,kJ  ) 

(22)  b1JJt  J J 

2i-  .t-,PROB(  X(t)-i,  X(t+  I )«j  I Y[l:TJ«y[l:Tl.  |au}.  {biJkj  ) 

2t»I.T-l,  y(l+ll-ka^'i)aijbjj  k/?(l+ 1 ,j) 

2 »-  i.t- i *•' )*i jb(  j>y„*  | )^( t + 1 J) 

In  fact  it  can  be  shown  ([BS])  that 

(23)  PR08(  Y[l:T]-y[l:Ti  I i?u).  1^1  ) > PROB(  Y[l:T]-y[l:T)  | jaj.  |biJkj  ) . 

Thus,  each  time  the  re-estimation  equations  (21)  and  (22)  arc  used,  new  matrices  are  obtained 
such  that  the  estimated  probability  of  the  observations  Y[l:T]«y[l:T]  is  non-decreasing.  Since 
this  estimated  probability  is  a continuous  function  or  the  matrix  entries  (in  fact,  a polynomial  with 
terms  as  given  by  equation  (10)  ),  and  since  the  matrix  entries  are  constrained  to  a compact  set 
(because  the  entries  arc  non-negative  and  the  row  sums  are  I),  this  estimated  probability  must 
converge  for  any  sequence  of  matrices  obtained  by  repeated  use  of  the  re-estimation  equations. 
Hence  the  re-estimation  given  by  equations  (21)  and  (22)  may  be  used  repeatedly  in  an  attempt  to 
obtain  {a(J)  and  |b,JA}  which  maximize  PROB(  Yll:T]-y[l:T]  | JaJ.  |biJk)  ).  Thus  we  can 
obtain  an  approximation  to  maximum  likelihood  estimates  for  |au}  and  |bIJkj. 

In  re-estimating  the  matrices  A and  B,  the  special  structure  of  the  speech  recognition  problem 
can  be  used  to  good  advantage.  Although  it  is  convenient  to  use  a single  integrated  model  for  the 
actual  analysis  and  recognition  of  utterances,  the  re-estimation  of  the  structural  matrices  can  be 
performed  separately  for  each  of  the  levels  in  the  hierarchy.  Also  note  that  any  entry  in  A or  B 
which  is  zero  remains  zero  in  the  re-estimations  of  equations  (21 ) and  (22).  Therefore  we  are  able 
to  maintain  and  utilize  the  sparseness  of  these  matrices  in  the  re-estimation  process. 
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INTRODUCTION 

Each  of  the  knowledge  sources  in  a speech  recognition  system  can  he  represented  in  terms  of 
the  general  model  of  Chapter  II.  The  total  hierarchical  system  also  fits  such  a model,  and  it  is  the 
iota!  system  to  which  the  estimation  procedures  of  Chapter  II  arc  applied.  This  chapter  explains 
the  representation  of  knowledge  from  each  of  the  sources  and  their  integration  into  the  hierarchy 


REPRESENTATION  Ol  ACOUSTIC-PHONETIC  KNOWLEDGE 

r.ierc  arc  several  choices  as  to  how  to  represent  acoustic-phonetic  knowledge.  A decision 
must  be  made  whether  acoustic  observations  should  be  preprocessed  by  specialized  procedures  or 
whether  the  stochastic  model  should  d<-al  directly  with  the  acoustic  parameters.  The  representa- 
tion problem  is  easier  assuming  specialized  preprocessing,  so  consider  this  case  first. 

Assume  that  at  each  lime  t ( I < l < T ).  an  acoustic  observation  is  made.  Each  such 
observation  consists  of  a vector  of  values  of  a set  of  acoustic  parameters,  which  in  the  stochastic 
model  is  represented  by  a vector-valued  random  variable  Yd).  There  is  a sequence  of  phones 
PI  * :JI  which  is  Prwtluccd  during  the  time  interval  I < t < T.  Assume  that  the  phones  occupy 
disjoint  segments  of  time;  that  is,  assume  there  is  a sequence  s0  < s,  < s,  < s,  < ...  < s,  such  that 
P(j)  lasts  from  observation  Yfs^,)  through  observation  YfSj-  I).  (Set  s0  = |,  Sj  *»  T.) 

Let  p|l:J|  be  the  actual  sequence  of  phones  in  an  utterance  and  let  y|l:T|  be  the  actual 
observed  sequence  or  acoustic  parameters.  For  convenience,  also  introduce  a special  initialization 
phone  p«»  which  is  assigned  a special  value  to  allow  the  initial  probabilities  to  have  the  same  form 

as  the  transition  probabilities  later  in  the  sequence.  Since  the  actual  times  s,.  s,.  s, Sj  , are  not 

known,  it  is  necessary  to  associate  each  arbitrary  segment  of  time  with  some  phone.  For  each  pair 
of  limes  t,  and  l,  let  S(t,.l,)  be  that  value  of  j lor  which  the  expression  (Min(s|,t2)-Max(s|  ( .t , ) ) is 
maximized  ( I hat  is.  we  associate  with  the  pair and  t,  the  index  of  the  phone  segment  which  has 
the  greatest  interval  in  common  with  the  interval  from  t,  to  t2.)  II  t2  < I,  then  set  S(t,.t2)  = 0. 


I he  acoustic  preprocessor  tries  to  estimate  a phonetic  transcription  from  the  acoustics  alone. 
IJ\  looking  lor  discontinuities  or  rapid  changes  in  the  acoustic  parameters,  the  preprocessor  divides 
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the  sequence  up  into  K phone-like  segments  Y| l:t,—  1J.  Y[t,:t2- 1 J.  Ylt2:tj-IJ,  Y|tK  rtK-l] 
Then  an  attempt  is  made  to  classify  each  segment  Y(tk_i:tk-I  J using  some  form  of  pattern 
recognition  procedure.  Let  t,  < t,  < tj  < ...  < tK  be  the  segment  boundary  times  as  decided  by  the 
preprocessor  and  introduce  the  random  variable  D(t)  which  is  I if  there  exists  a k such  that  tk  . t 
and  is  0 otherwise.  Let  F(k)  be  the  label  assigned  by  the  preprocessor  to  the  segment 
YK-i:tk- 1 J (For  completeness,  set  tk  - t„  - I for  k < 0.  and  tk  * lK  * T Tor  k > K.) 

With  some  pattern  matching  procedures  it  is  possible  to  directly  estimate  conditional  probabil- 
ities. When  using  such  a procedure,  let 


(1)  B(p,k)  * PROB(  Y|lk_,:tk— I J«y|tk  |:tk_iJ  | P(S(lk_,.tk)«p  ) 

(the  probability  that  segment  k corresponds  to  phone  p as  estimated  by  the  pattern  matching 
procedure).  On  the  other  hand,  the  pattern  matching  procedure  might  yield  only  a label  F(k) 
representing  a best  guess  as  to  the  underlying  phone.  In  such  a case,  it  is  necessary  to  estimate  the 
conditional  probabilities  from  statistics  of  performance  or  the  pattern  matcher  on  hand-labeled 
data.  Let  r[l:KJ  represent  the  actual  sequence  of  labels  generated  by  the  pattern  recognizer  for 
the  utterance  being  considered.  Then  set 

(2)  B(p.k)  - PROB(  F(k)-r(k)  | P(S(lk_,.tk))«p). 

(The  probability  that  segment  k corresponds  to  phone  p is  estimated  as  the  probability  that  a 
segment  labeled  f(k)  corresponds  to  phone  p.)  where  the  conditional  probability  is  estimated  by 
the  frequency  of  such  events  in  a set  of  training  utterances. 

In  addition  to  estimating  the  probability  or  substitutions  or  confusions,  it  is  necessary  to 
estimate  the  probability  of  the  preprocessor  producing  either  loo  many  or  loo  few  segments.  The 
probability  of  such  events  may  be  estimated  from  their  frequency  of  occurrence  in  a set  of  training 
utterances.  Let 

(3)  E(p1.p2.n)-PROB(D(ik.2)-D(tk.|)-D(ik)-|.D(ik_2+l:tk_1-iJ-o.  D[tk_,  + I:tk- IJ-0  | 

P(S(lk_2,ik_,))«p,.  P(S(lk_|fik))-p2.  S(lk_l.tk)«S(lk_2,ik_|)+n  ). 
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(The  probability  that  the  segmented  finds  one  boundary  between  a segment  corresponding  to 
phone  p,  and  a segment  corresponding  to  phone  p2.  given  that  the  phones  are  actually  n positions 
apart  in  the  sequence  of  phones.)  If  the  acoustic  preprocessor  is  reliable,  then  E(p,.p2.n)  should  be 
small  . <cept  for  n=l  and  should  be  negligible  for  n>2.  In  an  implementation  of  the  DRAGON 
system  which  uses  an  acoustic  preprocessor,  it  has  arbitrarily  been  assumed  that  n(p,.p,.n)  = 0 for 
n>4.  Note  that  E(p,.p2.0)  is  undefined  and  meaningless  unless  p,  = p2. 

We  can  now  estimate  the  conditional  probability  of  the  sequence  Y|I:TJ  given  the  sequence 

(4)  PROB(  Y|  I :T|  = y|  I :T|  | P|0:J l-p|0:  IJ ) 

* 2n|i  KU(K).jB(P<z(k)).k)E(p(z(k-l)),p(z(k)).n(k)). 

where  z(k)  « An(i)  and  the  sum  is  taken  over  all  sequences  n|  l:K|  such  that  z(K)  * J.  (By 
convention  z(0)  = 0.)  This  equation  is  a special  case  of  equation  (9)  of  Chapter  II. 

In  order  to  apply  the  theory  of  a probabilistic  function  of  a Markov  process,  it  is  necessary  to 
specify  the  transition  probabilities  for  the  phone  sequence  P|I:J|.  It  is  the  task  of  the  other 
sources  of  knowledge  to  specify  these  probabilities.  Phonological  rules  may  be  represented  either 
directly  or  indirectly  in  the  estimates  of  E(P|.p2,n)  and  B{P,k).  but  all  higher  levels  of  the  hierarchy 

deal  only  with  the  sequence  P|I:J|  and  arc  insulated  from  the  acoustics  Y|I:T|  or  the  labels 
F|I:K| 

Even  if  no  special  preprocessing  is  assumed,  it  is  not  difficult  to  represent  the  acoustic- 
phonetic  knowledge,  but  there  is  a penalty  of  extra  computation.  Direct  estimation  of  the 
conditional  probability  PR()B<  Y|  I :T|=y|  I :T|  | P|  I :J  | = p|  I :J | ) is  similar  to  the  problem  of 
machine-aided  segmentation  and  labcling(|B2|).  Similar  algorithms  have  also  been  used  for 
wird-spotting  in  continuous  speech  (|B4|,  |BII|)  and  for  isolated  word  recognition  (|ll|).  The 
essential  idea  is  an  elastic  change  of  the  time  scale  to  optimally  match  a sequence  of  acoustic 
observations  to  a sequence  of  prototypes. 
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lo  relate  the  phones  to  the  acoustic  observations  requires  knowledge  of  .the  acoustic  phenom- 
ena which  arc  expected  with  each  phone.  In  line  with  the  probabilistic  approach,  each  phone  is 
assumed  to  be  associated  with  a stochastic  process  which  produces  acoustic  parameter  values  for 
each  instance  of  the  phone.  The  statistical  properties  of  the  stochastic  process  associated  with  any 
particular  phone  are  to  be  estimated  from  occurrences  of  the  phone  in  a set  of  training  utterances 
which  have  already  been  segmented  and  labeled. 

Each  acoustic  observation  is  to  take  a value  from  a finite  set  D Assume  that  for  each  phone  p 
there  is  a positive-integcr-valued  random  variable  Zp  and  a family  of  random  variables  Xp(l). 
XP(2).  Xp(3). ...  , Xp(Zp)  with  values  in  D.  Let  fpB  be  the  conditional  probability  function 

(5)  fpn(x(l).x(2).x(3) x(n))-PROB(  Xp|  I :n|=x|  I :n]  | Zp=n  ) 

Lei  £p(n)  ■ PROB<  ) The  interpretation  is  that  Zp  is  the  duration  of  an  instance  of  phone  p 
and  Xp[  1 :7.p]  are  the  acoustic  observations  made  during  that  instance  of  p. 

Let  y|  I :T|  be  the  sequence  of  observations  made  for  the  utterance  being  analyzed.  Let  p|  I :J| 
be  the  sequence  of  phones  in  the  utterance.  Let  U|  I :J]  be  the  sequence  of  boundary  times  for  the 
phones.  That  is.  U(l)  < U<2)  < U(3)  < < U(J)  and.  for  each  j.  P(j)  lasts  from  observation 

Y'Ufj— I))  to  observation  Y(U(j)-l).  Suppose  a set  of  observations  Y||:TJ  and  times  U|  I J|  arc 
produced  by  applying  in  succession  the  stochastic  processes  Tor  each  of  the  phones  P(  I ) through 
P(J)  and  concatenating  the  observations,  the  individual  processes  being  independent.  Then  the 
probability  of  producing  the  observed  sequence  is 

(6)  PROB(Y|l:T|-y|l:T|.U|l:J|-u|l:J|  | P|  I :k|=p|  I :J|  ) 

“ ,,»-ij(rpi,i.u«). .„„-i,(yl“(j- 1 ) «(j»- 1 i)gpl))(u(j)-u(j- 1 )>). 

The  segmentation  and  labeling  problem  consists  of  finding  the  correct  set  of  values  for  the 
sequence  U|  I J|  Representing  the  acoustic-phonetic  knowledge  in  a speech  recognition  system  is 
similar,  except  the  transitions  among  the  phones  arc  determined  by  probabilities  specified  by  other 
sources  of  knowledge  rather  than  being  a known  sequence. 


Note  that  our  model  is  such  that  lor  a given  k and  u|k  J|  we  can  evaluate 
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(X)  PROB{  Y|u(k):l  |«y|u(k):T|.  U|k:J|»u|k:Jj  | P|  I :J|-p|  I :J| ) 

“ ^j-k>IJ^P<|l.uli>-ulj-l|(ylU(j“  I ):u(j) — I |)gp|j)(u<j)-u(j-  I ))); 

that  is.  the  probability  docs  not  depend  on  U|  I :k- 1 1.  The  process  is  an  example  of  a probabilistic 
function  of  a Markov  process  with  the  vector  (k.U(k))  being  the  slate  variable  of  the  Markov 
process.  The  problem  of  machine-aided  labeling  can  be  solved  by  ihe  techniques  of  Chapter  II 

Introduce  the  function 

(9)  Y|(j.t)  - Maxu(1.J|u|j|_((  PROB(  Y|  I :t- 1 )«y|  | - 1 1.  U|  I :j|-u|  l:j|  | P|  l:JJ»p|  l:JJ ) ). 

That  is,  y,(j,l)  is  the  probability  of  the  best  sequence  leading  up  to  the  stale  (j.t).  The  function  y, 
may  be  calculated  according  to  equation  ( 18)  of  Chapter  II.  Thus 

(10)  y,<j.t)  - Mnxk(  y ,( j I .l-kH^j, k(y(|-k:t- 1 l)gwj,(k)  ). 

Let  K(j.l)  be  any  value  of  k for  which  this  maximum  is  achieved.  Then  after  y,  and  K(j.i)  have 
been  calculated  for  all  j and  t.  the  best  sequence  u|  I :J|  is  obtained  by 

(11)  u(j)  * u(j+  I ) - k(j+l.u(j+l )) 
where  u(J)  » T. 


If  we  arc  willing  to  assume  that  Xp<  I ).  Xp(2),  Xp(3) Xp(Zp)  arc  independent  and  indcnli- 

cally  distributed  and  that 


( 12)  l?p(n)  * ( I — a)an  , for  some  a independent  of  p, 

then  an  even  simpler  computation  is  possible.  It  is  not  claimed  that  these  additional  assumptions 
arc  realistic  (the  acoustic  properties  or  real  phones  arc  much  more  complicated).  However,  they 
do  produce  reasonable  results  with  a great  savings  in  computation. 

The  extra  assumptions  allow  us  to  ignore  the  durations  of  the  phones  by  factoring  out  a factor 
which  is  the  same  for  all  sequences  u|l:J|,  namely  the  factor  (l-a)JaT.  Let’s  reformulate  the 
Markov  process,  ignoring  duration  information.  Let  the  state  (j.t)  correspond  to  the  event  U(j-  I ) 
< l < U(j)  with  U(j—  I ) otherwise  unrestricted  (lime  l occurs  during  phone  P(j)).  Let  y,(|.l)  be 


. L 
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the  probability  for  the  best  sequence  leading  up  to  the  stale  (j,t)  and  producing  the  sequence 
yl  I :t|  Then  y2  may  be  calculated  by 

(13)  y2(j,t)  - Max<  y2(j- I ,t- I ),  y2(j,t- I ) )PROB(  Xpt))=y(i)  ). 

Then  the  sequence  u|  I :J  | may  be  calculated  by 

(14)  u(k)  m (the  greatest  integer  value  of  t 

such  that  t < u(j+  1/  and  y2(j—  1 ,t—  I ) > y2(j,t-  I)  ). 

In  machine-aided  labeling  it  is  only  necessary  to  consider  a single  sequence  p|  I :J  | In  a speech 
recognition  problem,  we  wish  to  maximize  not  only  over  all  possible  sequences  u|  I ,J|  but  also  over 
all  possible  phonetic  sequences  p|l:J|,  subject  to  the  transition  probabilities  determined  by  the 
higher  levels  of  the  hierarchy.  The  computation  of  a function  like  y,  or  y2  is  not  performed 
separately  at  the  acoustic  level,  but  is  performed  on  a Markov  process  representing  the  integrated 
hierarchy. 

REPRESENTATION  OF  LEXICAL  KNOWLEDGE  AND  PHONOLOGICAL  RULES 

This  section  discusses  the  computation  of  the  conditional  probabili'y  PROB(  P[  I :J|=p|  I :J|  | 
W|  l:l]**w|  I 1 1 ) where  W|  I :|]  is  the  sequence  of  words  in  the  utterance  and  P|  l:J|  is  the  sequence 
of  phones.  Each  word  is  represented  by  an  abstract  network  to  which  we  may  apply  the  rc- 
eslimation  procedure  of  equations  (21)  and  (22)  or  chapter  II  The  prototype  word  network 
consists  of  several  columns  or  nodes  (to  simplify  the  discussion,  assume  that  there  are  exactly  two 
nodes  per  column)  with  each  node  connected  to  itself  and  to  every  node  in  its  column  and  in  the 
two  following  columns.  Such  a network  is  shown  in  Figure  I.  where  only  the  arcs  leaving  from  one 
particular  node  have  been  shown. 

If  each  node  corresponds  to  a phone,  then  an  arc  which  slays  in  the  same  column  represents 
insertion  of  an  extra  segment  At  this  level  we  arc  primarily  interested  in  representing  insertions 
(and  other  phonological  phenomena)  made  by  the  speaker,  but  as  already  mentioned  there  is 
always  a choice  between  representing  a given  phenomenon  at  this  level  (where  word-level  context 


Chapter  III  _ REPRESENTATION  OF  KNOWLEDGE  SOURCES 


I'ape  2H 


GENERAL  WORD  PROTOTYPE 

O O 

O O 

FIGURE  I 

is  known)  or  ai  ihc  acouslic-phonelic  level  (where  only  one  phone  of  context  is  known).  An  arc 
which  skips  a column  represents  a missed  or  deleted  segment. 

Let  Y(t)  be  the  phone  which  occurs  at  time  t.  Note  that  in  this  hierarchical  system,  the 
sequence  which  is  the  (unobserved)  internal  sequence  at  one  level  is  Ihc  external  sequence  for  Ihc 
next  higher  level.  Whether  the  acoustic  level  assumes  a preprocessor  or  not.  this  next  level 
assumes  as  its  external  sequence  a sequence  of  phones  (except  there  arc  several  phenomena  which 

could  be  represented  at  cither  level).  Let  X(l)  * (X,(l).  X2(l))  be  the  internal  state  in  our  abstraet 
word  model,  where 

1 < X,(t)  < C.  X,(t)  ■>  column  number  at  time  l 
1 £ X2(l)  < R.  X2(l)  = row  number  at  time  l 

where  C is  the  number  of  columns  in  the  abstract  model  and  R is  the  number  or  rows.  For  the 
purpose  of  this  discussion,  we  take  C fixed  at  the  number  or  phonemes  in  the  canonical  version  of 
the  word  (stored  in  a dictionary)  and  take  R fixed  at  2.  Various  values  of  C and  R can  be  used  and 
tested  against  the  actual  data. 

This  abstract  network  with  the  associated  conditional  probabilities  represents  the  probability 
distribution  of  possible  pronunciations  of  the  word.  We  assume  that  the  phonetic  sequences 
corresponding  to  instances  of  the  word  are  generated  by  a Markov  process.  Let 

(IS)  A(  (Cj.r,).  (Cj.rj)  ) = PROB(  X(l)=(e2.r2)  | X(l-  I )-(c,.rl)  ) 
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(16)  B(  (c.r).p  ) - PROB(  Y(i)=p  I X(t)«=(c.r)  ) 

If  we  arc  given  a collection  of  instances  of  a particular  word  W,  and  have  estimates  for  A and  B 
we  can  use  equations  (21)  and  (22)  to  re-estimate  A and  B for  the  word  W Phonological  rules 
which  produce  extra  segments  or  deleted  segments  are  represented  by  A and  substitutions  un- 
represented by  B Phonological  rules  which  apply  across  word  boundaries  can  be  represented  by 
having  several  extra  states  at  the  beginning  and  end  of  each  word  and  having  the  initial  piobability 
distribution  depend  on  the  context. 


Several  variations  of  this  lexical  model  are  also  worth  considering  If  the  acoustic  level 
estimates  not  just  the  phones  but  the  transemes  (pairs  of  phones  as  estimated  by  the  acoustic 
transition  between  them,  as  in  the  ARCS  and  IBM- Watson  systems)  then  the  lexical  level  should 
have  the  distribution  of  Y(t)  depend  not  just  on  X(t)  but  also  on  X(t-  I)  It  is  possible  to  integrate 
the  acoustic  and  lexical  levels  and  directly  re-estimate  the  representation  of  a word  in  terms  of  the 
acoustic  parameters.  This  approach  is  being  followed  by  Bakis.  Another  approach  is  to  obtain  a 
network  representing  the  possible  pronunciations  of  a word  by  applying  a list  of  phonological  rules 
written  as  production  rules  and  applied  to  a baseform  representation  or  the  word  Automatic- 
procedures  for  applying  such  a list  of  rules  for  the  purpose  of  speech  recognition  systems  have- 
been  developed  by  Cohen  and  Mercer|CI ] and  by  Barnctt|B5| 

The  explicit  representation  of  phonological  rules  in  the  network  is  easily  achieved  at  an 
expense  of  doubling  or  tripling  the  number  of  nodes  in  the  network.  However,  it  is  not  essential 
that  an  exhaustive  set  of  phonological  rules  be  used.  In  fact,  the  implementation  of  the  DRAGON 
system  described  in  Chapter  IV  has  no  explicit  phonological  rules  and  only  one  canonical  pronun- 
ciation ror  each  word  f lic  reason  that  this  representation  is  possible  is  that  any  phonological 
phenomena  which  are  not  introduced  explicitly  will  be  treated  at  the  acoustic-phonetic  level  Thus 
phonological  substitutions  can  be  mimicked  by  adjusting  the  probabilities  in  the  B and  E 
(equations  (I),  (2).  and  (3))  which  represent  the  probabilities  of  substitutions  and  insertions  and 
deletions  at  the  acoustic  level.  The  disadvantage  of  this  approach  is  that  .he  matrices  represent 
less  context  than  is  available  in  the  explicit  representation  of  the  phonological  rules  at  the  lexical 
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There  is  a serendipitous  benefit  in  using  the  matriees  H and  II  to  represent  acoustic-phonetic 
knowledge  independently  from  the  representation  of  the  phonological  rules,  ir  the  matriees  B and 
L arc  estimated  by  running  the  acoustic  preprocessor  on  a collection  or  training  utterances,  then 
any  phonological  rules  which  are  left  out  in  the  prepared  labeling  of  the  training  utterances  arc 
automatically  absorbed  inti  he  estimates  of  B and  E.  Thu«  a perfect  hand-labeled  transcription  or 
the  training  utterances  is  not  only  unnecessary,  bat  undesirable.  The  best  labeling  for  training 
purposes  is  an  automatically  generated  labeling  rrom  a procedure  knowing  the  sequence  or  words 

and  having  exactly  the  same  lexical  knowledge  and  phonological  rules  as  the  speech  recognition 
system 

REPRESENTATION  Ol  SYNTACTIC  AND  SEMANTIC  KNOWLEDGE 

In  building  the  integrated  network,  the  lexical  and  phonological  rule  procedures  lake  as  input  a 
network  representation  or  the  syntax  and  semantics  in  which  each  node  or  the  network  represents 
a word.  It  is  clear  that  any  regular  (Unite  state)  grammar  can  be  represented  by  a Unite  network. 
In  a speech  recognition  system  the  distinction  between  a regular  grammar  and  an  arbitrary 
context-free  or  context-dependent  grammar  is  somewhat  artificial.  Consider  the  language 
generated  by  a particular  grammar,  not  the  sequence  or  words,  but  the  sequence  or  acoustic  events. 
It  is  not  unreasonable  to  assume,  ror  example,  that  the  entries  in  the  acoustic-phonetic  matrix 
B(p.k)  arc  all  non-/ero.  although  perhaps  very  small  Such  a result  would  automatically  be  the 
case  with  pattern  recognition  based  on  a posteriori  probabilities  if  the  conditional  probability 
distributions  ror  the  acoustic  parameters  are  multi-variate  normal  distributions. 

But  d each  entry  in  D(p.k)  is  non-/cro,  then  at  the  acoustic  level  the  language  must  include  all 
possible  sequences  Such  a language  can.  or  course,  be  represented  by  a Unite  network  grammar, 
lhus  the  issue  becomes  not  one  of  generating  the  proper  language,  but  rather  one  or  accurately 
modeling  the  conditional  probabilities.  The  conditional  probabilities  may  be  context-dependent 
even  Tor  a language  generated  by  a context-free  grammar  The  approach  which  has  been  used  in 
the  DRAGON  system  has  been  to  enlarge  the  finite  grammar  to  allow  the  conditional  probabilities 
U.  be  more  accurately  represented,  but  not  to  try  to  retain  all  of  the  context  of  the  actual  language 
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The  properties  of  probabilistic  grammars  have  been  studied  by  several  investigators  (|BI0|, 
ICIJ.  IF3I.  |G2J.  fill],  |SI  |,  |S2|.  |T4]).  A probabilistic  finite  state  grammar  is  a special  case  of  a 
probabilistic  function  of  a Markov  process  in  which  the  entries  in  the  matrix  {bi  jk}  of  equation  (5) 
of  Chapter  II  afe  all  zeros  or  ones  (only  the  transitions  arc  probabilistic).  Thus  such  a grammar 
can  be  immediately  represented  in  terms  of  our  general  model.  However,  there  is  still  the  problem 
of  estimating  the  transition  probabilities. 

The  general  abstract  model  is  not  as  well  suited  to  representing  semantic  knowledge  as  it  is  to 
representing  the  other  sources  of  knowledge  which  have  been  discussed.  In  the  implementation 
described  in  Chapter  IV.  there  '’-is  been  no  attempt  to  represent  semantic  knowledge.  In  fact,  an 
argument  could  he  made  that,  since  there  is  no  process  corresponding  to  understanding  the 
sentence,  whatever  knowledge  is  represented  by  the  abstract  stochastic  model  is  of  necessity  not 
semantic  knowledge.  However,  it  should  be  noted  that  it  is  not  necessary  for  the  stochastic  model 
to  directly  represent  the  semantic  knowledge  itself,  but  rather  it  is  necessary  for  the  model  to 
represent  the  influence  of  the  ..cmantic  knowledge  on  the  probability  distributions  of  possible 
sequences  of  words. 

For  example,  it  is  possible  to  have  a specialized  task-specific  module  which  is  capable  or 
understanding  the  utterances  of  a given  task  and  which  is  capable  of  representing  the  set  of 
utterances  which  are  possible  in  a given  context.  The  HEARSAY  speech  understanding  system 
employs  such  a mechanism  for  the  VOICE  CHESS  task.  The  task  is  to  recognize  chess  moves  that 
arc  spoken  bv  a user  who  is  playing  a game  of  chess  against  the  computer.  The  system  has  a 
separate  module  consisting  of  a chess  playing  program,  TECH.  Not  only  docs  the  TECH  program 
play  chess  with  the  user,  but  when  it  is  the  user’s  turn  to  move.  TECH  lists  for  the  recognition 
system  all  moves  which  are  possible  in  the  given  position  and  even  rates  the  moves  Thus  the 
TECH  program  provides  semantic  guidance  for  the  recognition  system  A similar  mechanism  may 
be  used  to  obtain  semantic  knowledge  for  the  DRAGON  system.  Or.ce  the  list  of  legal  moves  is 
obtained  and  rated,  this  information  may  be  used  in  setting  the  transition  probabilities  for  the 
probabilistic  grammar  The  fine  details  may  be  lost,  but  much  of  the  information  will  be  represent- 
ed, the  quality  of  the  representation  depending  on  the  complexity  of  the  grammar. 
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There  is  even  a meehanism  hy  which  the  stochastic  model  can  obtain  some  semantic  informa- 
tion  without  a specialized  module.  Consider  the  goal  of  mimicking  a human  being  who  is  trying  to 
guess  the  next  word  in  an  utterance  when  given  some  limit  :d  amount  of  context.  This  person,  who 
is  capable  of  understanding  the  utterance,  could  use  whatever  semantic  knowledge  is  available 
from  the  limited  context.  In  this  situation  the  semantic  knowledge  is  more  limited  than  that  which 
is  used  by  the  TECH  program,  which  knows  the  entire  sequence  of  previous  moves  and  hence  the 
current  board  position,  but  it  is  still  of  value  to  the  speech  recognition  system.  The  problem  of 
obtaining  the  statistics  for  this  type  of  semantic  knowledge  is  part  of  the  general  problem  of 
estimating  the  transition  probabilities  for  a probabilistic  grammar. 

The  transition  probabilities  for  the  grammar  network  can  be  estimated  from  statistics  for  a set 
of  training  sentences  A large  set  of  training  sentences  should  be  used,  but  they  only  need  to  be 
transcribed  orthographieally.  not  phonetically,  at  this  level  of  the  hierarchy.  If  Bayesian  statistics 
arc  used,  the  o pnori  probabilities  could  be  set  to  achieve  the  same  effect  as  a non-probabilistic 
use  of  the  grammar  The  o poueriori  probabilities  would  then  be  a strict  improvement  (as  judged 
by  performance  on  the  training  sentences). 

To  the  extent  to  which  the  statistics  of  the  training  sentences  reflect  the  true  probabilities  for 
spontaneous  utterances  for  the  specific  task,  the  probability  network  represents  not  only  the 
syntax  of  the  task  but  also  all  of  the  predictive  information  which  can  be  obtained  from  the 
semantics  of  the  available  context  That  is.  if  the  true  probabilities  were  known,  the  probability 
network  would  be  an  optimal  predictor  for  a given  amount  of  context,  and  therefore  would  predict 
at  least  as  well  as  a human  who  is  given  the  same  amount  of  context  and  who  presumably  is 

capable  of  understanding  the  sentence  (although  the  context  in  this  ease  is  not  necessarily  the 
whole  sentence) 

Inter-sentence  semantics  can  also  be  introduced  into  the  probability  network  One  way  to  use 
mter-scntcncc  semantics  is  to  employ  a user  model.  Suppose  there  is  a model  for  the  user  in  a 
particular  task  such  that  the  the  model  gives  probabilities  for  the  user  transitioning  among  a finite 
number  of  stales  depending  on  the  types  of  utterances  which  the  user  has  made  Conceptually  Ibis 
model  fils  in  easily  as  an  extra  level  of  the  Markov  hierarchy  Computationally  n requires  that 
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conditional  probabilities  be  estimated  separately  for  each  u.ser  state,  A user  model  is  especially 
valuable  if  certain  key  sentences  trigger  user  transitions  with  probability  one  and  ,f  for  each  user 

state  only  a small  subset  of  the  general  grammar  is  used.  Then  there  is  a savings  in  both  the 
computation  and  the  storage  requirements. 

SUMMARY 

Each  of  .he  major  sources  of  knowledge  in  a speech  recognition  rys.cn,  can  be  represent  as 
a Mochas, ic  process  (usually  in  more  lhan  one  way).  In  speech  recognition  each  knowledge  source 

involves  an  idealized  process  Xfl).  X(2).  X(3, X(T»  which  is  no,  observed  and  a process 

V(l).  V(2>.  V(3) V(T)  depending  on  Ihe  X process.  The  V process  is  cither  directly  observed 

or  is  inferred  from  lower  level  knowledge  sources  in  ,he  speech  recognition  sys.em.  Such  a dual 
process  can  be  modeled  as  a probabilislie  funelion  of  a Markov  process.  In  ihe  DRACON  syslem 
such  a model  is  used  for  each  of  the  knowledge  sources. 

The  speech  rccognilion  knowledge  sources  fil  inlo  a hierarchy  such  lha,  Ihe  imcgralcd  syslem 

also  is  a probabilislie  function  of  a Markov  process.  Such  a simple  general  model  for  speech 

recognihon  peroms  a -ceogui.ion  program  which  is  jusl  a simple  implemenla.ion  of  general 

network  search  algorithm.  Such  an  implementation  of  ihe  DRAGON  system  is  described  in 
Chapter  IV. 
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INTRODUCTION 

In  Chapter  II,  the  general  properties  of  a probabilistic  function  of  a Markov  process  were 
discussed  Chapter  111  explained  some  of  the  ways  in  which  the  knowledge  sources  of  a continuous 
speech  recognition  system  can  be  represented  by  such  a model.  This  chapter  describes  an 
implementation  of  a complete  speech  recognition  system  based  on  these  models.  This  implementa- 
tion is  intended  as  a preliminary  system  demonstrating  the  practicality  of  building  a complete 
system  based  entirely  on  the  abstract  Markov  model.  It  is  not  intended  as  a final  system  demon- 
strating the  full  power  of  the  techniques  described  here.  Each  knowledge  source  is  given  a 
simplified  representation,  and  the  probabilities  in  the  networks  arc  estimated  a priori  rather  than 
by  any  automatic  re-estimation  procedure. 

The  system  is  simple,  but  it  is  a complete  speech  recognition  system.  Starting  with  knowledge 
represented  in  conventional  forms-a  context-free  grammar,  a phonetic  dictionary,  an  arbitrary  set 
of  acoustic  parameters— there  is  a set  of  programs  for  constructing  the  integrated  Markov  model, 
and  a general  recognition  program  which  can  recognize  speech  for  any  task  based  on  the  integrated 
network  which  has  been  constructed  by  the  other  programs.  There  is  some  training  which  is 
dependent  on  the  talker  and  on  the  set  of  acoustic  paramters.  but  which  is  independent  of  the  task. 
This  training  is  done  by  selecting  by  hand  a set  of  prototypes  Tor  the  acoustic  segments  from  a set 
of  utterances  by  the  talker  for  whom  the  system  is  to  be  trained. 

This  implementation  of  the  DRACON  system  consists  of  five  programs:  MAKDIC. 
MAKGRM.  MAKNET.  GETPRB,  and  DRAGON.  For  each  program,  a brief  description  will  be 
given  of  what  is  docs  and  of  how  it  docs  it.  The  system  has  been  tested  on  a set  of  102  utterances 
wuh  about  20  utterances  from  each  of  5 interactive  computer  tasks.  The  5 tasks  arc  VOICE 
CHESS  (the  user  speaks  his  moves  while  playing  chess  against  the  computer).  DOCTOR  (the  user 
asks  medical  questions  and  the  computer  simulates  a patient),  DESK  CALCULATOR  (the 
computer  acts  as  a desk  calculator  for  spoken  commands).  NEWS  (the  computer  gives  the  current 
news  stones  whose  subjects  match  a spoken  specification),  and  FORMANT  (the  computer 
generates  various  kinds  of  graphic  displays  of  speech  data,  according  to  spoken  requests)  The 
grammars  for  these  5 tasks  are  given  in  Appendix  II.  some  sample  utterances  in  Appendix  F 
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MAKDIC 

MAKDIC  roads  a phonelic  dictionary  and  writes  a tile  describing  a network  representation  to 
eaeh  word  in  the  dictionary.  It  is  this  program  whieh  woold  contain  any  knowledge  ot  within-word 
phonological  rules.  Actually,  the  current  implementation  ol  DRAGON  does  not  use  any  explicit 
phonological  rules,  so  the  output  ol  MAKDIC  is  just  a one-to-one  translation  ol  the  phonetic 

dictionary.  Eaeh  word  is  represented  b,  a linear  network  with  each  node  connected  to  itsell  and  to 
the  following  node. 

A phonetie  dictionary  ineluding  all  the  words  for  the  5 tasks  is  given  in  Appendix  A.  The 
dictionary  is  written  at  a very  broad  phonetic  level  and  has  been  edited  by  hand  to  break  up 
dipthongs  and  stops  into  acoustic  segments.  Certain  groups  of  phones  which  were  distinct  in  the 
original  dietionary  were  replaced  by  a single  symbol  for  each  group.  This  grouping  was  performed 
when  the  phones  within  a group  were  practically  indistinguishable  under  the  acoustic  parameteri- 
zation used  in  this  implementation.  The  hand  editing  was  designed  to  achieve  an  effect  like  the 
lexical  model  of  equations  (III.  1 5)  and  (III.  16)  of  Chapter  III,  with  C- 1 . 

The  list  of  acoustic  segment  types  whieh  appear  in  the  dictionary  is  given  in  Table  I.  A 
section  of  the  dietionary  is  shown  in  Table  2.  The  complete  dictionary  is  Appendix  A.  A flow- 
chart of  the  MAKDIC  program  is  shown  in  Figure  3,  and  a scetion  of  its  output  file  is  shown  in 
Table  4.  In  this  implementation,  since  no  phonological  rules  arc  applied,  the  MAKDIC  program 
just  goes  through  the  dielionary  word-by-word  and  goes  through  each  word  phone-by-phone. 

The  section  of  output  shown  in  Tabic  4 is  interpreted  as  follows:  251  is  the  index  of  the  word 
"with"  in  the  dictionary.  4 is  the  number  of  phonetic  segments  in  the  word.  For  eaeh  of  the  4 
phonetie  segments  there  are  two  lines.  The  first  I in  line  2 is  the  index  of  the  current  phonetie 
segment  within  the  word.  0 is  the  internal  code  for  this  segment  type,  The  next  I indicates 
the  number  of  arcs  leading  to  this  node  from  nodes  other  than  itself.  0 is  the  probability  of  this 
node  being  skipped.  900  indicates  that  the  probability  of  the  arc  from  this  node  to  itself  is  .900. 
(All  probabilities  are  multiplied  by  1000  and  truncated  to  integers.)  Next  follows  a list  of  all  the 
nodes  (other  than  the  node  itself)  with  ares  leading  to  the  eurrent  node  (in  each  ease  there  is  only 
one).  The  0 in  line  3 is  the  index  within  the  word  of  the  node  which  has  an  arc  leading  to  the 
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ACOUSTIC  SEGMENT  LABELS 
silence,  pause,  voice-bar 
AX  (A)ROUT 

B A(B)OUT  (release-aspiration  portion) 

AH  N(U)MBNESS 

T (T)ELL  (release -aspiration  portion) 

AE  H(A)MMING 

S (S)EVEN.  (Z)ERO 

I-  (L)ET 

UW  D(O) 

l;  (F)EVER.  WI(TH) 

ER  (R)OOK.  FEV(ER) 

EH  L(E)T 

IH  K(I)NG 

D (D)IS;IDE  (release-aspiration  portion) 

P (P)AWN  (release-aspiration  portion) 

N (N)INE 

AO  P(AW)N 

AA  (OCTAL 

M (M)UMPS 

Sll  BI(SH)OP.  MEA(S)URE 

K (K)ING  (release-aspiration  portion) 

IY  OU(EE)N 

NX  KI(NG) 

G (G)IVE  (release-aspiration  portion) 

Y ( Y)OU 

V Fl(  V)E 

W (W)E 

OW  ZER(O) 

W)|  (QU)FEN  (release-aspiration  and  devoiccd  semi-vowel) 

HH  (H)AMMING 

UH  R(00)K 


TABLE  I 

SECTION  OF  DICTIONARY 


WITH 

USING 

HAMMING 

HANNING 

BLACKWELL  - 

RFCTANGUI  AR  - 

TRIANGULAR  - 

FREOUENCY  - 

BANDWIDTH 

CENTER 

CUTOFF 

LOW 

PASS 

HIGH 


W IH  F 

Y UW  S IH  NX 
IIH  AE  M IH  NX 
HH  AE  N IH  NX 
BLAE-KWEHL 

ER  EH  - K - T EH  IH  N - G Y UW  L AA  ER 
T HR  AA  IH  EH  IH  N - G Y UW  L AA  ER 
F ER  IY  - K W EH  N - S IY 
B AE  N - D W IH  - D F 
S EH  N - TER 
K AH  - T AO  F 
LOW 
P AES 
HH  AA  IH 


TABLE  2 


current  node  The  100  indicates  that  the  probability  of  following  this  arc  is  . 1(H)  The  remaining 
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Do  for  WRDNUM"  2 to  (number  of  words  in 
dictionary) 


Read  entry  from  phonetic  dictionary 


Output  a line  giving  current  word  and  number 
of  phones  in  current  word 


Do  for  PHNNUM=1  to  (number  of  phones 
in  word) 


Output  a line: 

(PHNNUM)  (PHNCODE)  1 (SKIPPRB) 
(REPEATPRB) 


Output: 

(PHNNUM-1)  (I.O-REPEATPRB) 


End  of  word? 


End  of  dictionary? 


phonetic  segments  arc  represented  similarly. 
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SECTION  OF  DICTIONARY  NETWORK  LISPING 

251  WITH  4 

10  -10  900 
0 100 

2 16  W I 0 900 

1 100 

3 2X  IH  I 0 900 

2 100 

4 7 F I 0 900 
3 100 

TABLE  4 

MAKGRM 

MAKGRM  reads  a context-free  grammar  specified  by  a BNF  representation  and  writes  a 
network  representation  of  a related  finite-stale  grammar.  In  the  current  implementation  each 
appearance  of  a terminal  symbol  in  the  BNF  is  represented  by  a separate  node  in  the  network,  but 
all  appearances  of  each  non-terminal  symbol  are  linked  together  This  linking  implies  a loss  of 
context.  For  the  tasks  for  which  this  implementation  of  the  DRAGON  system  has  been  used,  the 
original  BNF  grammars  have  been  hand  edited  so  that  any  non-terminal  symbol  which  appeared  in 
iwo  contexts  whtch  were  important  to  keep  distinct  was  replaced  by  two  distinct  non-terminal 
symbols  A limited  expansion  of  this  type  could  have  been  performed  by  the  MAKGRM  program 
itself,  but  since  it  was  a one-time  task,  it  was  done  by  hand  instead. 

An  example  of  an  expansion  of  a non-terminal  symbol  is  the  symbol  <piecc>  in  the  VOICE 
CHESS  grammar  (Appendix  B).  The  symbol  <piecc>  names  the  piece  taking  the  action. 
< piece h>  is  part  of  the  location  for  that  piece.  <picccc>  is  a piece  being  captured,  and  <p.cccd> 
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is  either  part  of  the  location  to  which  a piece  is  moving  or  part  of  the  location  on  which  a piece  is 
being  captured. 

Note  that  if  either  the  left  contexts  or  the  right  contexts  are  identical  L • two  uses  of  the  same 
non-terminal,  then  the  uses  do  not  need  to  be  distinguished.  If  the  left  contexts  are  identical,  then 
there  is  no  context  information  to  be  remembered.  If  the  right  contexts  are  identical,  then  the  left 
context  information  does  not  influence  the  interpretation  of  the  rest  of  the  sentence.  Note  that 
<pieced>  has  two  different  uses  in  the  CHESS  grammar,  with  different  left  contexts,  but  identical 
right  contexts. 

The  current  version  of  MAKGRM  performs  a straight-forward  translation  of  the  BNF.  Each 
production  is  represented  by  a simple  linear  network.  All  the  productions  with  a particular  left 
hand  side  arc  linked  together  with  a dummy  node  at  each  end.  These  dummy  nodes  arc  then 
linked  to  any  nodes  in  the  grammar  which  represent  uses  of  the  non-terminal  symbol  that  is  the  left 
hand  side  of  these  productions.  A part  of  the  FORMANT  grammar  is  shown  in  Figure  5.  Figure  6 

| 

shows  the  network  in  which  each  production  has  been  represented  by  a simple  linear  network. 

Figure  7 shows  the  network  after  the  initial  and  final  nodes  for  each  non-terminal  symbol  have 
been  linked  to  the  uses  of  that  non-terminal.  A flowchart  for  MAKGRM  is  given  in  Figure  8. 

BNF  GRAMMAR 

<spcc> 

<phr><spcc> 

j 

A <wind>  WINDOW  OF  <num>  POINTS 
<num>  COEFFICIENTS 
FILE  NUMBER  <num> 

UTTERANCE  NUMBER  <num> 

FIGURE  5 


<phr>::« 


<spcc>::« 
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PARTIALLY  CONNECT  ED  NETWORK 


<phr>::« 


<spcc> 


<phr> 


<spcc> 


A <wind>  WINDOW  OF ► <num>  — POINTS 

<num>  COEFFICIENTS 
FILE  -*  NUMBER  — -*■  <num> 

UTTERANCE  — -»  NUMBER  <num> 


FIGURE  6 


SECTION  OF  GRAMMAR 


<*pec> 


f , i 

< wind >-*-■•  WINDOW  — - Of 


COEFFICIENTS 


FILE  — ■*  NUMBER  — num^" 
UTTERANCE  — - NUMBER  ~ J 


<phr> 


FIGURE  7 


91 

« 
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Read  BNF  grammar  to  find  all  non-terminal 
symbols 


Set  NODENUM=  I 


Read  one  line  of  BNF  grammar 


If  symbol  is  enclosed  in  brackets  <>  (it  is  a 
non-terminal)  then 

1 ) Mark  current  node  as  non-terminal 

2)  Find  symbol  in  list  of  non-terminals,  set 
SYMNUM  to  the  index  of  the  symbol  in  the 
list. 

3)  NODENUM=NODENUM+ 1 


HU 


FIGURE  K 


il 
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MAKCRM  (cent.) 


Otherwise  symbol  is  a terminal  symbol  then 

1 ) Mark  node  as  a terminal. 

2)  Find  symbol  in  lexicon;  set  SYMNUM  to 
index  of  word  in  lexicon. 

3)  NODENUM-NODENUM+I 


End  of  line? 

If  yes  then  mark  last  node  as  the  end  of  a 
production. 


Eod  of  grammar? 


Do  for  NODENUMwl  to  (number  of  nodes 
which  have  been  crcatcr.) 


If  current  node  is  the  initial  node  for  a non- 
terminal symbol,  then  introduce  an  arc  into 
the  network  connecting  each  node  represent- 
ing a use  of  this  non-terminal  with  this  initial 
node. 


If  current  node  is  the  final  node  for  a non- 
terminal, then  introduce  an  arc  connectin 
each  node  which  ends  a protection  for  this 
non-terminal  with  this  final  node. 


4 

FIGURE  8 (corn.) 
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MAKGRM  (eont  ) 

4 


4^ 


FIGURE  8 


MAKNET 

MAKNET  lakes  as  input  a network  representation  of  a grammar  (produced  by  MAKGRM) 
and  a network  representation  of  the  dictionary  (produced  by  MAKDIC).  It  produces  an  integrat- 
ed network  by  substituting  the  appropriate  word  network  for  each  node  in  the  grammar  network. 
Phonological  rules  which  apply  across  word  boundaries  could  be  used  to  adjust  the  network  after 
the  substitution. 

MAKDIC.  MAKGRM.  and  MAKNET  must  keep  track  of  the  transition  probability  associated 
with  each  are  of  the  network.  At  present  simple  default  values  are  used  MAKDIC  assigns  a 
probability  of  ,9  to  any  arc  leading  from  a node  back  to  itself,  and  . I for  any  arc  leading  to  the 
next  node.  This  corresponds  to  acoustic  parameters  sampled  once  every  10  milliseconds,  with  no 
presegmentation,  and  an  average  phone  duration  of  100  milliseconds,  based  on  the  acoustic- 
phonetic  model  of  cqations  (III.  12).  (Ill  13),  and  (III.  14). 

The  complete  input  and  output  for  MAKGRM  and  MAKNET  is  shown  for  a simple  language 
in  Appendix  C.  First  the  simple  BNF  grammar  is  given  Next  the  output  file  of  MAKGRM  is 
shown.  Consider  the  productions  with  the  non-tcrinmal  symbol  <i  revest  > as  the  left-hand  side 


Oulpui  a representation  of  ihc  network 


FIGURE  9 


Hie  sub-network  for  these  productions  begins  with  the  line  "<rcqucsl>::«  6 -2  I."  The  6 is 

the  node  number  for  this  node,  which  is  the  special  initial  node  for  this  left-hand  side.  -2 
indicates  that  this  node  is  associated  with  the  second  non-terminal  svmbol.  I indicates  that  this 
node  has  only  I arc  leading  to  it.  (In  this  implementation,  each  arc  is  listed  with  the  node  to  which 
the  arc  points  and  transition  probabilities  are  given  conditional  on  the  state  after  the  transition, 
rather  than  in  the  conventional  form  presented  in  Chapter  II.  This  form  has  been  chosen  for  the 
convenience  of  foe  implementation,  the  two  theoretical  models  are  equivalent.)  2 (on  the  next  line) 
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is  the  node  number  of  the  node  with  an  arc  leading  to  the  current  node,  and  1000  indicates  that  the 
probability  of  following  this  arc  is  1 .000. 


"Compute"  is  the  word  associated  with  the  next  node,  which  is  node  7.  It  is  a terminal  symbol 
and  291  is  its  index  in  the  dictionary.  This  node  has  I predecessor,  which  is  node  6 (with  probabil- 
ity 1.000).  Node  X is  associated  with  the  third  (-3)  non-terminal  symbol  <func-phr>.  The  node 
has  I predecessor,  node  7.  NikIc  9 is  associated  with  the  word  "Use"  which  has  index  222  The 
node  has  1 predecessor,  node  6 (which  is  the  initial  node  for  this  set  of  production  ) Node  10  is 
associated  with  the  non-terminal  symbol  <param  phr>.  and  its  only  predecessor  is  node  9.  Node 
1 1 is  the  final  node  for  this  set  of  production,  (with  <requcsl>  as  the  left-hand  side)  It  has  two 
predecessors,  node  17  and  node  32.  which  arc  equally  likely  Node  17  is  the  final  node  for  the 
productions  for  the  symbol  <func-phr>.  which  is  oc.atcd  with  node  X Node  32  is  the  final 
node  of  the  productions  for  the  symbol  <parant-phr>. 


MAKGRM  assigns  an  equal  probability  to  all  arcs  leading  to  the  same  node.  This  default 
condition  implies  .ha,  the  DRAGON  system  is  currently  using  no  semantic  knowledge,  not  even 
statistically  (except  for  any  semantic  knowledge  which  is  included  in  the  grammar  itself) 


The  output  of  MAKNET  is  a combination  of  the  outputs  of  MAKDIC  and  MAKGRM  Each 
node  corresponds  to  an  acoustic  segment.  Except  at  word  boundaries,  each  ncJe  has  only  one 
predecessor  besides  itself  Notice  that  there  are  many  nodes  marked  These  silence  nodes  are 
common  because  the  dictionary  ind, cates  that  every  word  begins  with  a silence  (because  the  word 
may  be  preceded  by  a pause)  The  dynamic  time  warping  is  sufficiently  powerful  that  these 
silences  can  be  allowed  throughout  the  network.  If  no  silence  ,s  actually  present  m the  acoustic 
signal,  then  the  dynamic  time  warping  will  shrink  the  duration  of  time  assigned  to  the  "-"  node  to 
a single  10  rr.i!!i.;ccond  segment 


GETPRB 


GETPRB  takes  as  mpul  a set  of  acoustic  parameter  values  and  produces  as  output  a vector  of 
probability  estimates  Each  entry  in  the  probability  vector  represents  the  conditional  probability 
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of  producing  Ihc  given  set  of  aeouslic  parameter  values,  conditional  on  Ihc  actual  phone  at  the  lime 
of  Ihc  acoustic  observation  being  the  phone  corresponding  to  lhal  particular  position  in  the 
probability  vector 


GETPRB 


\ 

Do  for  PHONENUM«l  to  (number  of  phonetic 
labels) 

{ 

\ 

f 

Compare  current  acoustic  parameters  with  each 
prototype  of  current  phone.  Find  the  prototype 
which  is  the  minimum  distance  from  the  current 
parameter  vector. 

> 

f 

P*Max(0.Min<  1 .IOOO/2j.1JJ(A,<i)-A„<i))1)) 

> 

PRB(PHONENUM)  - P 

N 

1 

/ 

Last  phone?  — 

J 

■ 1 
YES 

FIGURE  10 


NO 


Any  convenient  set  of  acoustic  parameters  and  any  matching  procedure  could  be  used  here. 
The  current  version  of  the  DRAGON  system  uses  12  acoustic  parameters  sampled  once  every  10 
milliseconds.  The  basic  parameters  arc  an  amplitude  measure  and  a zcro-crossing-count  for  each 
of  five  filler  bands,  and  for  the  unfillcrcd  signal.  The  five  filler  bands  are 
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AI.ZI  200-400  llertz 

A2,  Z2:  400-K00  Hertz 

A3.Z3:  800- 1 600  Hertz 

A4,  Z4:  1600-3200  Hertz 

A5,  Z5:  3200-6400  Hertz 

AU,  ZU  are  for  the  unfiltcrcd  signal. 

The  vector  of  twelve  parameters  is  normalized  in  a non-linear  fashion  by  dividing  A I , Z1 , A2, 
Z2,  A3,  Z3,  A4.  Z4.  A5,  Z5  each  by  the  sum  of  the  twelve  paramters  and  multiplying  by  1000.  No 
attempt  has  been  made  to  find  an  optimal  non-linear  transformation;  this  transformation  has  been 
selccteo  by  informal  experimentation  with  a small  number  of  alternative  transformations.  The 
reason  a transformation  is  introduced  is  that  so  many  of  the  consonants  arc  so  low  in  amplitude  in 
aU  the  bands  that  they  are  difficult  to  separate  by  any  simple  metric.  The  measurements  on  the 

unfiltered  signal,  AU  and  ZU,  arc  not  normalized,  so  they  retain  the  information  of  overall 
amplitude. 

The  amplitude  measures  and  zero-crossing  counts  are  normalized  together  because,  especially 
for  the  low  amplitude  cases  that  we  are  trying  to  separate,  the  zero  crossing  counts  also  give  a kind 
of  amplitude  measure  This  phenomenon  occurs  because  the  zero  crossing  counter  only  counts 
cycles  which  exceed  a certain  threshold.  Thus  for  signals  whose  amplitude  is  near  the  threshold, 
the  zero  crossing  count  is  actually  a sensitive  measure  of  the  amplitude  For  strong  signals  the  zero 
crossing  count  measures  the  frequency  of  the  major  spectral  peak  within  a particular  band 

GEIPRH  measures  the  distance  between  a particular  vector  of  (normalized)  acoustic 
parameter  values  and  a particular  prototype  by  a simple  Euclidean  distance  However,  there  are- 
several  prototypes  for  each  phone.  The  prototypes  were  selected  by  hand  from  a set  of  50  training 
sentences  spoken  by  the  same  talker  as  the  one  on  whom  the  system  has  been  tested. 

One  prototype  for  each  phone  was  found  among  the  50  sentences  by  hand.  Each  prototype 
was  just  the  (normalized)  vector  of  acoustic  parameter  values  for  some  10  millisecond  segment 
occuring  during  an  instance  of  the  desired  phone.  Using  the  GETPRB  from  these  initial  proto- 
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types,  DRAGON  was  run  as  a machine-aided  labeling  program  on  the  same  50  sentences  (that  is, 
DRAGON  was  told  the  sequence  of  words  in  each  sentence,  but  not  the  times  at  which  they 
occured). 


The  output  of  the  machine-aided  labeling  was  then  carefully  checked  by  hand  (there  were 
about  one  or  two  corrections  per  sentence).  The  labels  produced  by  GETPRB  were  then  com- 
pared with  this  hand-checked  segmentation.  Whenever  there  was  a steady-state  acoustic  segment 
for  which  no  prototype  had  probability  greater  than  .1,  a new  prototype  was  added  for  the  phone 
which  the  hand  segmentation  marked  as  occuring  at  that  time. 

An  arbitrary  transformation  is  applied  to  convert  the  Euclidean  distance  measure  to  an 
estimate  of  the  conditional  probability.  The  transformation  is  given  by  ccualion  (I ). 

(I)  P - Max(  0.  Min(  I,  ( 1000 / ,2(  As(i)  - AP(i)  )2 )))), 

where  As(i)  is  the  value  of  the  i th  acoustic  parameter  for  the  current  sample,  and  A,.(i)  is  the 
value  of  the  i th  acoustic  parameter  in  the  prototype. 

A sample  of  the  acoustic  labeling  produced  by  GETPRB  is  given  in  Appendix  D for  a portion 
of  the  utterance  "Use  a Hamming  window  of  five  hundred  twelve  points."  First  a table  of  the 
values  of  the  \2  (normalized)  acoustic  parameters  is  given;  then  a table  of  the  top  7 prototypes  for 
each  10  millisecond  segment  is  given.  Each  row  in  each  table  represents  one  10  millisecond 
segment.  The  segment  number  is  in  the  first  column.  In  the  parameter  table  the  remaining 
columns  are  the  values  of  Zl.  Al.  Z2.  A2,  Z3.  A3.  Z4.  A4.  Z5,  A5,  ZU,  and  AU.  respectively. 

In  the  table  of  labels,  each  label  is  followed  by  a number  which  is  its  index  in  the  list  of 
prototypes.  Frequently  several  prototypes  for  the  same  label  occur  among  the  top  7 prototypes 
The  final  two  columns  are  the  squares  of  the  Euclidean  distances  from  the  current  set  of  acoustic 
parameter  values  to  the  best  and  second  best  prototypes. 

From  lime  95  to  time  I OX,  the  parameters  arc  almost  all  0,  and  is  the  best  prototype. 
Then  Y is  the  best  label  from  109  to  III.  "UW"  is  best,  or  one  of  the  best,  from  I 13  to  134 
Occasionally  another  label  (IY,  AX,  L)  is  rated  best,  but  none  of  these  labels  scores  high  through- 
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out  the  time  from  1 13  to  134.  This  section  of  lime  would  reliably  be  marked  as  "UW."  from  the 
acoustic  information  alone.  The  section  from  136  to  I3K  is  a transition  between  the  "UW"  and 
the  "S,"  and  no  label  scores  well.  From  139  to  144  is  the  "S."  Notice  that  parameters  A4  and  Z4 
are  0 throughout  this  segment.  This  is  a feature  for  distinguishing  "S"  from  "SH."  and  the  system 
reliably  labels  S and  ' SH"  with  these  acoustic  parameters. 

There  is  no  real  acoustic  evidence  for  the  word  "a."  and  the  vowels  and  nasals  of  the  word 
"Hamming"  are  not  very  clear.  At  this  point  the  value  of  an  integrated  system  with  other  sources 
of  knowledge  becomes  clear.  Rather  than  doing  segmentation  and  labeling  from  the  acoustics 
alone,  the  system  makes  all  decisions  in  terms  of  the  integrated  network  representation.  The 
system  was  able  to  select,  using  the  labels  shown  here,  the  word  "Hamming"  over  all  alternatives. 

including  the  word  "Hannin*."  However,  the  system  missed  the  word  "twelve"  later  in  the 
utterance. 

DRAGON 

The  main  recognition  program.  DRAGON,  is  just  an  implementation  of  equations  ( IK).  (19). 
and  (20)  of  Chapter  II.  The  B matrix  is  proved  in  implicit  form  by  the  procedure  GETPRB.  The 
A matr.x  is  represented  by  the  network  produced  by  MAKNET  and  the  default  transition 
probabilities.  In  comparison  with  a general  transition  matrix,  the  matrix  is  very  sparse  (almost  all 
of  its  entries  are  zero).  The  network  corresponds  to  a compacted  representation  of  the  transition 
matrix.  Each  node  in  the  network  corresponds  to  a row  of  the  matrix,  and  each  non-zero  entry  in 
that  row  corresponds  to  an  arc  in  the  network  leaving  that  node.  Since  there  are  usually  only  two 
non-zero  entries  per  row,  the  representation  is  very  compact.  Thus  the  2356x2356  clement 
transition  matrix  for  the  formant  tracking  task  is  stored  in  a few  thousand  memory  locations. 

Equation  (20)  of  Chapter  II  requires  that  a back  pointer  be  saved  telling  the  best  way  to  gel  to 
each  node  at  each  point  in  lime.  Again  it  is  possible  to  make  use  of  the  extreme  sparscncss  of  the 
A matrix.  Since  a list  is  kept  of  all  arcs  leading  to  a given  node,  a compact  back  pointer  can  be 
kept  using  only  enough  bits  to  select  one  of  the  short  list  of  ares.  These  back  pointers  are  stored  as 
variable  length  bytes,  fitting  as  many  pointers  per  memory  location  as  possible.  This  packed 
representation  of  the  back  pointers  makes  it  possible  for  the  current  version  of  DRAGON  to  keep 
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DRAGON 


\ 

Do  for  t«  I to  (number  of  10  millisecond  seg- 
ments in  utterance) 

N 

X — — — 

/ 

Call  GETPKB 

V 

r 

Do  for  j=  1 to  (number  of  nodes  in  integrated 
network) 

(- 

> 

r 

For  each  i,  such  that  i is  a predecessor  of  cur- 
rent node  j.  compute  yd-l.jja^.  Set  g(i,j) 
to  the  maximum  of  these.  Save  pointer  to  the 
i for  which  the  maximum  occurs  (save  it  in 
bit-packed  form) 

NO 

> 

/ 

1 Last  node? 

J 

1 

YES 

< 

Do  for  j * 1 to  (number  of  nodes) 

1 



J 

-N 

✓ 

2 

PHONE  = the  phone  associated  with  this 
node 

r(l.j)  - g(t.j)PRg(  PHONE) 

1 


FIGURE  II 

all  the  back  pointers  for  a six  second  utterance  in  core  memory.  In  fact,  the  back  pointers  for  a 
given  10  millisecond  segment  for  the  formant  tracking  task  fit  in  73  memory  locations  (36  bits 


each) 


Last  node? 


YES 

J' 

End  r f utterance? 


YES 


Do  fort ■ T- 1 by  (-1)  to  I 


{ 

> 

V. 

< 

Find  NODE(t)  from 
NODEO  + l) 

back  pointer  from 

Output  the  list  of  words 


FIGURE  II 

A flowchart  of  the  DRAGON  program  is  shown  in  Figure  II  The  program  performs  the 
computation  of  equation  (IK)  for  t = I,  T.  Each  node  j is  considered  in  turn.  Since  in  this 
implementation  the  implicit  bt|  k is  independent  of  i,  the  value  of  i for  which  the  maximum  occurs 
in  equation  (IK)  depends  only  on  y(t—  l,i)  and  aj(.  This  value  is  found  and  saved  as  a back 
pointer.  If  p is  the  phone  corresponding  to  node  j.  then  the  b(j k for  the  current  acoustic  parameter 
values  is  the  number  which  GETPRB  returns  in  position  p of  the  probability  vector  The  computa- 
tion of  y(t,j)  is  completed  by  multiplying  by  this  factor. 


Chapter  IV  — IMPLEMENTATION 


Page  52 


Once  the  computation  of  equation  (IX)  has  been  done  for  t = I through  T,  the  back  pointers 
are  retrieved  according  to  equations  (19)  and  (20).  The  maximum  m equation  (19)  is  taken  only 
over  those  nodes  which  represent  the  end  of  a complete  utterance.  For  the  grammars  which  have 
actually  been  used,  this  set  has  always  consisted  of  a single  node.  As  the  back  pointers  are  traced 
back,  the  optimal  sequence  of  internal  states  for  the  Markov  process  is  obtained  Since  each  node 
in  the  network  corresponds  to  an  acoustic  segment  within  the  acoustic  realization  of  a particular 
phoneme,  which  is  within  a particular  word,  which  is  in  a particular  place  in  the  grammar,  the 
sequence  of  states  determines  the  word  sequence,  the  phone  sequence,  the  segmentation  times,  and 
the  parse  of  the  sentence.  Whichever  sequence  is  of  interest  can  be  printed  out. 

PERFORMANCE  RESULTS 


The  current  implementation  of  the  DRAGON  system  has  been  tested  on  a total  of  102 
utterances,  with  about  20  utterances  from  each  of  five  interactive  computer  tasks  (described 
briefly  on  page  34).  In  lables  12-14,  the  performance  of  the  DRAGON  system  is  compered  with 
the  performance  of  the  HEARSAY  speech  understanding  system.  Because  this  implementation  of 
the  DRAGON  system  has  no  semantic  component,  the  semantic  module  of  the  HEARSAY  system 
was  disabled  for  this  experiment.  These  results  were  obtained  by  Lowerrc|L3|  in  a study  of  the 
comparative  strengths  and  weaknesses  of  the  two  systems.  Both  of  the  systems  used  the  12 
acoustic  parameters  described  above,  sampled  once  every  10  milliseconds. 

The  percentage  of  utterances  correctly  recognized  in  each  task  by  each  system  is  given  in 
Table  12  All  102  of  these  utterances  are  by  the  same  talker  The  percentage  of  words  correctly 
identified  is  given  in  I able  1 3 I lie  amount  of  computation  time  required  by  the  current  system  is 
given  in  I able  14  These  times  are  the  amount  of  central  processor  time  on  a PDP-10  computer  as 
a multiple  of  the  length  of  the  utterance 

Overall  th'  DRAGON  system  recognized  49%  of  the  102  utterances  and  identified  X3%  of 
the  57X  words  An  utterance  is  counted  as  being  correctly  recognized  if  all  of  the  words  in  the 
utterance  arc  correctly  analyzed.  Because  of  factors  such  as  varying  sentence  length,  the  percent- 
age of  words  correctly  identified  is  more  stable  for  different  tasks  than  the  percentage  of  utteranc- 
es recognized  Notice  that  the  DRAGON  system  maintained  a level  of  X4%  of  the  words  correctly 
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ACCURACY  OF  UTTERANCES  RECOGNIZED 


Hearsay 

Dragon 

Hearsay 

Dragon 

size  of 

no.  of 

% 

% 

% 

'X. 

Task 

lexicon 

utts 

correct 

correct 

missed 

missed 

Chess 

24 

22 

32 

68 

9 

0 

Doctor 

66 

21 

24 

76 

33 

0 

DesCal 

37 

23 

22 

17 

13 

8 

News 

28 

18 

50 

50 

1 1 

0 

Formant  194 

18 

33 

33 

44 

5 

102 

31 

49 

21 

3 

T*  % correct  f.'ttrc  i»  the  percent  ol  the  Intel  uttcr.ncc.  that  *ere  cnrrectl,  rcco,n„ed  The  % m.ued  f„„,t  n the  percent  of  the  total 
utterances  that  were  completely  missed,  i.e.  no  words  were  correctly  identified. 

TABLE  12 


ACCURACY  OF  WORDS  IDENTIFIED 


Task 

size  of 
lexicon 

no.  of 
words 

Hearsay 

% 

correct 

Dragon 

% 

correct 

Chess 

24 

uo~ 

69 

~94~ 

Doctor 

66 

92 

49 

88 

DesCa 1 

37 

1 16 

53 

63 

News 

28 

98 

74 

84 

Formant  194 

142 

33 

84 

578 

55 

83 

TABLE  13 

identified  on  the  interactive  formant  tracking  task. 

The  FORMANT  task  is  considerably  more  complex  than  the  other  tasks.  It  has  a vocabulary 
of  194  words  and  an  infinite  language  with  approximately  16"  sentences  of  length  n words.  Each 
of  the  other  tasks  has  a finite  language  with  the  number  of  possible  sentences  ranging  up  to  several 
hundred  million.  The  HEARSAY  system  was  able  to  recognize  33%  of  the  utterances  for  this 
task,  but  it  only  identified  33%  of  the  142  words.  It  missed  44%  of  the  utterances  completely, 
and  the  standard  deviation  of  its  computation  lime  is  higher  than  for  the  other  tasks. 


This  implementation  of  the  DRAGON  system  was  developed  using  training  sentences  (hy  the 
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time  needed  for  recognition 


Hearsay 

ave. 

times 

real  Std. 


Task 

time 

Dev 

SD/avc 

Chess 

13.7 

2.6 

. 19 

Doctor 

9.4 

3.8 

.40 

DesCa 1 

15.5 

9.4 

.61 

News 

10.8 

6.4 

.59 

Formant 

44.4 

23.5 

.53 

Dragon 

ave. 

times  Size  ()f 

real  Std.  Dragon 


time 

Dev. 

SD/avc 

O'" 

network 

48.0 

.6 

~ToT3 

4 10 

67.4 

1 . 1 

.016 

702 

83.1 

1.0 

.012 

916 

54.7 

.6 

.011 

498 

173.8 

3.3 

.019 

2356 

For  the  DRAGON  system: 

(recognition  time)  - (utt  lcngth)(20.9  + ,067(net  size)) 
This  is  accurate  to  within  about  3%. 


TABLE  14 


same  talkcrl  from  ,hc  tasks  CHESS.  DOCTOR.  and  FORMANT  The  HEARSAY  system  was 
developed  (or  ,asks  CHESS.  DOCTOR.  DESCAL.  and  NEWS.  In  „„  instance  wcrc  any  o(  ,hc 
utterances  used  in  training  the  systems  included  in  the  les,  results  reported  here  One  reason  the 
performance  of  the  DRAGON  system  on  the  DESCAL  task  was  inlerio,  „s  pe„„,„,ance  „„  ,hc 
other  tasks  ,s  tha,  the  DESCAL  task  includes  several  words  whteh  are  syntaetteally  e„ui,a,cn,  and 
whteh  are  phonetically  simtlar  under  the  analysis  used  by  the  current  system  No  attempt  has  been 
made  to  provide  extra  phonetic  prototypes  for  this  task. 

The  small  standard  dev, at, on  ,n  process, ng  tune  (,„  d, Keren,  utterances  w.thm  a task  ,s  a 
feature  of  the  optimal  search  algorithm  used  in  the  DRACON  system  A complete  search  „ done 
for  the  globally  optimum  path  through  the  network.  The  Markov  model  allow,  global 
optimum  he  found  in  a „n,c  which  is  prop,,,,,,,,, a,  the  length  ,„  the  utterance  I,  the  word, 

are  clear  and  easily  recogniml.  the  complete  search  takes  ,us,  a,  long  as  when  the  words  are 
unclear  and  difficult  „,  recogniae.  On  the  other  hand,  the  system  neve,  takes  longer  than  ,h,s  foted 
t,n,e.  and  i,  always  f.ud,  some  pall,  through  the  network.  In  Table  15.  results  arc  g.ven  for  an 
earlier  vers,,,,,  ,„  the  DRAGON  system  for  each  of  the  „ utterances  in  the  FORMANT  task  The 
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f 
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property  which  should  be  noticed  in  these  figures  is  that  the  processing  time  docs  not  depend  on 
how  many  errrors  are  made  in  analyzing  an  utterance. 

ACCURACY  AND  TIME  FOR  INDIVIDUAL  UTTERANCES 
Task.  Interactive  Formant  Tracking 
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9 

9 

9 

7 

10 

10 
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8 

7 

7 

7 
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1 1 

1 1 
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7 

6 
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4 

4 

4 

12 

10 

9 

8 

13 

4 

4 

4 

14 

4 

3 

0 

15 

10 

9 

8 

16 

1 1 

1 1 

7 

17 

10 

10 

8 

18 

6 

6 

6 
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correct  l/(  words 

lOi  - 

m2 

[words 

corrcctl/lwords 
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*90 

(words 

semantically  corrcctl/lwords 

out) 

6 
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18.7 

8 

4270 
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8 
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7 

3690 

1 18.5 

18.6 

5 

3490 
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18.6 

9 

5670 

115.9 

18.5 

10 

4510 

121.2 

18.4 

7 

3200 
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18.3 

1 1 

5120 

1 18.  1 

17.6 

6 

3300 

120.0 

17.5 

4 

307U 

1 19.6 

18.5 

8 

4480 

1 1B.0 

18.7 

4 

2760 

124.0 

18.8 

0 

2300 

131.2 

18.5 

9 

4260 

126.3 

19.2 

8 

5160 

119.7 

18.7 

9 

4060 

121.9 

17.9 

6 

31  10 

123.4 

' 7 . 9 

.919 


•In  - Number  of  word*  m actual  (input  I phrjuc 
•Out  ■ Number  of  word*  rn  output  phrase 
•Cor  m Number  of  worjs  cot  reel  ly  identified 

•SemCor  - Number  of  »oidt  semantically  unreel  (error  irrelevant 
Length  ■ Duration  of  pbiase  in  milliseconds 


Mam 

Aco 


(computation  time  of  maio  recognition  routine )/ 1 xnglh 
(computation  time  of  ;ic«»ustics  rmnJulc  I/Length 


to 


task) 


TABLE  15 

The  IK  utterances  are  shown  in  Table  16.  In  each  pair  the  actual  utterance  is  given,  followed 
by  the  utterance  which  the  DRAGON  system  found  as  the  optimal  path  in  its  model.  The  system 
correctly  recognized  8 of  the  IK  utterances.  If  we  consider  "compare"  (in  sentence  15)  to  have 
the  same  meaning  as  look  at  and  if  we  consider  "compare  A and  B"  to  be  equivalent  to 
compare  A with  B (in  sentence  9),  then  10  of  the  18  sentences  or  55%  are  semantically  correct 
A sophishicatcd  semantic  component  might  be  able  to  correct  some  of  the  other  errors.  Appendix 
E also  shows  the  correct  and  estimated  utterances  for  the  other  two  tasks  for  this  implementation 
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Utterances  for  Interactive  Formant  Tracking  Task 


1)  I want  to  do  formant  tracking. 

I want  to  do  formant  tracking. 

2)  Use  a Hamming  window  of  five  hundred  twelve  points. 

Use  a Hamming  window  of  five  hundred  points. 


3 )  Use  utterance  number 
Use  utterance  number 


six  of  file  number  five, 
six  of  file  number  five. 


4)  Increment  the  window  in  steps  of  one  hundred  points. 

Increment  the  window  in  steps  of  four  points. 

5)  For  each  window,  display  the  Fourier  spectrum. 

For  each  window,  display  the  formant  tracks . 

6)  Compute  the  LPC  smoothed  spectrum  using  the  autocorrelation  method. 

Compute  the  LPC  smoothed  spectrum  using  the  autocorrelation  method. 

7)  Compute  the  roots  of  the  inverse  filter  using  Bairstow’s  method. 

Compute  the  roqts  of  the  inverse  filter  using  Bairstow’s  method. 

8)  Display  the  imaginary  part  of  the  roots. 

Display  the  imaginary  part  of  the  roots. 

9)  I want  to  compare  the  autocorrelation  method  with  the  covariance  method. 
I want  to  compare  the  autocorrelation  method  and  the  covariance  method. 

10)  Increment  the  window  by  one  hundred  points. 

Increment  the  window  by  one  points. 

11)  Display  the  FIT  spectrum. 

Display  the  FFT  spectrum. 

12)  Use  a Hanninq  window  of  two  hundred  fifty-six  points. 

Use  a Hanninq  window  of  two  hundred  _ six  hertz. 

13)  Display  the  ITT  sfioctrum. 

Display  the  FI-'T  spectrum. 


Ifl ) Compute  the  Hilbert  transform. 

Use  two  points. 

15)  I want  to  look  at  imaqc  enhancement  with  different  parameters. 

1 want  to  com|.vire  imaqe  enhancement  with  dillerent  parameters. 

16)  Display  the  spectrogram  with  a pre-emphasis  ol  six  decibels  per  octave. 

Display  the  spectrogram  to  a pre-emphasis  of  six  thousand  five  hertz. 

17)  Use  a ceiling  ol  thirty  with  a floor  of  zero. 

Use  a ceiling  of  ten  to  a floor  of  zero. 

18)  For  each  utterance  display  the  spectrogram. 

For  each  utterance  display  the  spectrogram. 


TABLE  16 

of  DRAGON,  and  9 sentences  in  the  AP  News  task  and  H sentences  in  the  formant  task  for  an 
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earlier  version  of  DRAGON 

By  considering  the  specific  words  which  the  system  identified  incorrectly,  it  is  possible  to  gam 
some  insight  about  the  places  at  which  the  model  is  weakest  and/or  the  task  is  most  difficult.  The 
errors  for  the  FORMANT  task  are  given  in  Table  17. 

fcRRORS  IN  FORMANT  TASK 


actual  phrase 

substitution 

2) 

twelve 

4) 

one  hundred 

four 

5) 

Fourier  spectrum 

formant  tracks 

9) 

with 

and 

10) 

hundred 

12) 

fifty 

points 

hertz 

14) 

(entire  sentence  missed) 

15) 

look  at 

compare 

16) 

with 

to 

decibels  per  octave 

thousand  five  hertz 

17) 

thirty  with 

ten  to 

TABLB  17 


Six  of  the  twelve  places  at  which  errors  occur  involve  numbers.  It  is  not  surprising  that  numbers 
are  the  greatest  point  of  weakness.  In  any  context  in  which  a number  can  occur,  any  number  less 
than  one  billion  is  considered  grammatical  (sometimes  ineluding  zero).  The  system  has  no  source 
of  knowledge  other  than  acoustics  to  select  which  of  the  one  billion  possible  numbers  was  actually 


Chapter  IV  — IMPLEMENTATION 


Page  5X 


spoken.  Recognizing  a number  imbedded  in  continuous  speech  from  acoustic  information  alone  is 

a difficult  task,  and  the  one-oul-of-a-billion  selection  is  usually  beyond  the  ability  of  this  simple 
general  system 


The  prepositions  and  conjunctions  are  the  second  greatest  source  of  errors.  These  function 
words  are  usually  short  and  unstressed,  so  the  acoustic  information  is  very  unreliable.  Previous 
speech  recognition  studies  ((T3J)  have  shown  that  short  words  arc  missed  more  often  than  long 
words,  and  that  unstressed  function  words  are  missed  even  more  often  than  other  short  words.  On 
the  other  hand,  it  is  often  possible  to  " understand"  a sentence  as  a whole  without  correctly 
identifying  all  the  prepositions  and  conjunctions. 


Of  the  remaining  errors,  two  are  caused  entirely  by  a weakness  in  the  model.  The  oritmal 
BNF  grammar  specifics  that  a "window"  length  (sentence  ( 12))  be  given  as  a number  of  "points." 
and  a "pre-emphasis"  be  specified  in  "decibels  per  octave"  or  "db  per  octave."  In  translating  the 
BNF  grammar  to  a finite  stale  grammar,  these  restrictions  were  removed.  These  restrictions  could 
have  been  retained  in  the  finite  stale  grammar,  but  only  by  having  a larger  stale  space.  Six  copies 
of  iht  number  sub-grammar  would  suffice  to  distinguish  the  uses  of  number  with  different  right 
contexts  ("points",  "hertz".  <rcs-unii>,  "cocffficicnls".  "per  octave",  and  end-of-phrase).  If 
these  two  errors  were  corrected  with  an  expanded  grammar,  all  of  the  remaining  semantically 
important  errors  would  be  numbers,  cxccp*  for  sentences  (5)  and  ( 1 4). 


The  current  simple  implementation  of  the  DRAGON  system  has  been  designed  merely  to 
demonstrate  the  practicality  and  power  of  its  general  concepts.  Clearly  many  improvements  arc 
poaaiWe.  For  example,  the  acoustic  data  could  be  pre-processed  and  organized  into  phone-like 
segments.  Then  the  calculations  represented  by  equations  (II. IX)  and  (11.20)  would  only  need  to 
be  done  for  each  segment  rather  than  for  each  10  millisecond  acoustic  parameter  sample.  This 
reformulation  would  speed  up  the  calculation  in  the  main  recognition  program  by  a factor  of  about 
three  or  four.  Especially  for  larger  tasks,  substantial  savings  in  compulation  time  can  be  achieved 
by  employing  less  than  a complete  optimal  search.  A careful  study  must  be  done  to  determine  the 
trade-offs  between  performance  and  amount  of  compulation  with  sub-optimal  techniques.  More 
sophisticated  models  are  possible  for  the  knowledge  sources,  which  ought  to  improve  the  perform- 
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ance  although  they  would  generally  increase  the  amount  of  computation  A true  probabil.stic 

grammar  would  allow  a statistical  representation  of  some  semantics  as  well  as  a more  accurate 
grammar. 

CONCLUSIONS 

Lefs  review  the  major  features  of  the  DRAGON  speech  recognition  system  and  consider  how 
these  features  influence  the  performance  of  this  implementation  Some  of  the  features  of  the 

DRAGON  system  contribute  to  its  simplicity  and  case  of  implementation,  while  others  give  it  its 
power 

( 1 ) Generative  form  of  the  model 

I he  fact  that  the  abstract  model  represents  knowledge  sources  in  a generative  form  made 
MAKGRM  and  MAKDIC  much  simpler  to  implement.  The  DRAGON  network  explicitly 
represents  a finite  state  grammar.  Although  the  underlying  stochastic  process  is  assumed  to  be 
Markovian,  sufficient  context  is  included  in  the  formulation  of  the  slate  space  so  that  the  finite 
stale  grammar  is  represented  exactly.  It  is  not  necessary  to  make  any  compromise  to  represent  the 
inverse  of  grammatical  productions  based  on  local  context.  In  this  regard  the  DRAGON  system 
shares  some  of  the  advantages  of  the  top-down  recognition  systems  On  the  other  hand,  the 
present  implementation  is  limited  to  a finite  slate  space,  so  MAKGRM  translates  any  context-free 
grammar  to  a related  finite  state  grammar 

(2)  Hierarchical  arrangement  of  knowledge  sources 

I he  arrangement  of  the  knowledge  sources  into  a conceptual  hierarchy  simplifies  the  imple- 
mentation of  the  DRAGON  system  by  allowing  a modularity  that  separates  the  details  of  the 
representation  of  the  knowledge  sources  from  the  recognition  program  In  this  simple  implementa- 
tion this  modularity  is  expressed  in  the  fact  that  MAKGRM,  MAKDIC,  MAKNET,  GETPRB,  ana 
DRAGON  are  independent  programs  with  well-defined  communication.  In  a more  sophisticated 
implementation  the  modularity  could  progress  even  further  and  would  !>e  even  more  valuable 
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The  hierarchical  arrangement  is  also  rcrieetcd  in  the  sparse  ness  of  the  transition  matrix  for  the 
integrated  process.  This  sparseness  has  played  an  important  role  in  this  implementation  of  the 
DRAGON  system  The  explicit  network  representation  allows  us  to  directly  access  the  non-zero 
entries  of  the  transition  matrix,  thus  avoiding  unnecessary  computations  in  the  formal  equation 
(11.18).  The  bit-packed  representation  of  the  back  pointers  allows  the  entire  recognition  computa- 
tion to  be  performed  using  core  memory. 

(3)  Integrated  network  representation 

This  implementation  of  the  DRAGON  system  integrates  the  segmentation  and  labeling  into 
the  hierarchy,  so  the  optimal  search  algorithm  performs  the  segmentation  and  labeling  along  with 
the  word  identification  and  parsing.  A price  is  paid  in  terms  of  the  amount  of  compulation  time 
because  the  underlying  Markov  process  steps  once  for  every  10  millisecond  segment,  rather  than 
once  for  every  phone-like  segment.  However,  even  this  simple  implementation  can  show  the 
advantage  of  an  integrated  system  compared  to  a system  attempting  to  make  decisions  based  on 
any  one  knowledge  source  in  isolation.  The  help  which  the  recognition  procedure  gels  from  other 
sources  of  knowledge  allows  the  segmentation  and  labeling  to  be  done  reliably  even  with  the  crude 
acoustic  pa^r.i  .tcrs  and  simple  metric  used  in  GETPRB. 

(4)  General  theoretical  framework 

The  presence  of  a general  theoretical  framework  greatly  simplified  the  implementation  of  the 
DRAGON  system  It  .s  this  feature  which  has  made  it  possib!  ■ to  construct  a complete  speech 
recognition  system  with  limited  manpower.  It  has  been  necessary  to  compromise  the  theoretical 
framework  in  a few  places  (notably  the  GETPRB  procedure  and  the  lexical  model),  bat  in  general 
there  has  b;'cn  much  less  special  purpose  programming  than  there  would  have  been  without  the 
abstract  model  I he  abstract  model  has  been  sufficiently  flexible  that  very  few  compromises  have 
bee.-'  necessary  in  deciding  what  knowledge  to  represent  (with  the  important  exception  of  semantic 
knowledge,  which  has  been  omitted  entirely).  The  only  significant  example  is  that  the  grammar 
represented  in  the  network  is  a finite  state  grammar  rather  than  a general  context-free  grammar 
This  restriction  has  not  been  a significant  handicap  for  the  5 tasks  which  have  been  implemented 


so  far. 


Chapter  IV  — IMPLEMENTATION 


Page  61 


(5)  Optimal  stochastic  search 

The  optimal  search  strategy  is  probably  the  most  unique  feature  of  the  DRAGON  system  It 
has  a significant  disadvantage  in  requiring  extra  computation.  However,  the  special  features  of  the 
Markov  model  allow  an  optimal  search  algorithm  for  which  the  amount  of  compulation  is  not 
nearly  as  great  as  might  naively  be  supposed.  This  implementation  of  the  DRAGON  system, 
despite  many  drawbacks  and  simplifications,  has  shown  that  an  optimal  search  is  possible  and 
practical. 

The  advantages  of  optimal  stochastic  search  come  from  avoiding  early  decisions  which  might 
be  wrong.  By  extending  all  partial  paths  in  parallel  we  are,  in  effect,  delaying  all  decisions  until  all 
context,  past  and  future,  has  been  considered.  The  amount  of  "context"  is  determined  by  the 
formulation  of  the  Markov  stale  space.  In  the  highly  stylized  grammars  used  in  these  interactive 
computer  tasks,  the  "context"  often  reaches  all  the  way  back  to  the  beginning  of  the  utterance. 
Thus  the  optimal  search  strategy  may  delay  the  decision  about  the  first  word  of  the  utterance  until 
the  effect  of  this  decision  on  the  entire  sentence  has  been  considered. 

FUTURE  WORK 

There  are  many  improvements  which  can  be  made  even  within  the  framework  of  the  current 
system.  The  introduction  of  a sophisticated  acoustic  preprocessor,  while  departing  from  the 
philosophy  of  building  an  entire  system  from  the  same  abstract  model,  would  result  in  a significant 
increase  in  computational  speed.  The  techniques  for  using  such  a preprocessor  within  the  general 
DRAGON  system  are  described  in  Chapter  III  (equations  (9),  (10),  and  (II)) 

The  lexical  model  could  be  improved  either  by  introducing  phonological  rules  or  by  using  the 
general  lexical  model  of  Chapter  III.  Either  model  eould  be  trained  using  the  procedure  represent- 
ed by  equations  (21)  and  (22)  of  Chapter  II. 

The  syntactic-semantic  model  would  be  improved  by  introducing  estimates  of  the  conditional 
probability  distributions  into  the  grammar.  Given  a task  with  a known  grammar,  this  estimation 
mainly  involves  the  collection  of  statistics  for  a large  corpus  of  utterances  from  a dialogue  in  the 
inler-aetive  computer  task.  Even  for  a task  with  an  unspecified  grammar,  an  attempt  can  be  made 
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to  approximate  the  grammar  using  the  re-estimation  procedure  of  equations  (21)  and  (22)  of 
Chapter  II. 

The  assumption  of  a finite  stale  space  (and  hence  a finite  slate  grammar)  is  not  essential. 
Markov  processes  may  have  infinite  state  spaces,  and  much  of  the  theory  used  here  carries 
through.  There  are  serious  problems  which  must  be  solved  to  obtain  a practical  implementation, 
but  they  arc  not  insurmountable.  For  example,  equation  (18)  of  Chapter  II  can  be  generalized  to 
apply  to  an  arbitrary  context-free  grammar,  at  the  expense  of  making  the  number  of  computations 
proportional  to  T3  rather  than  to  T.  By  segmenting  the  utterance  into  syllables.  T would  be  the 
number  of  syllables  and  T3  might  not  be  too  large. 

What  general  implications  can  be  drawn  from  the  results  of  the  DRAGON  speech  recognition 
system?  The  DRAGON  system  differs  from  most  other  speech  recognition  systems  in  three 
important  ways:  (I)  the  use  of  Markov  models.  (2)  the  use  of  the  same  abstract  model  to  represent 
each  of  the  knowledge  sources,  and  (3)  the  optimal  search  strategy. 

Since  the  slate  space  can  be  formulated  to  include  specific  context  information,  the  assump- 
tion of  the  Markov  properly  in  the  models  is  not  so  much  an  assumption  as  it  is  a prescription  to  be 
followed  in  the  formulation  of  the  stale  space.  The  results  for  this  simple  implementation 
demonstrate  that  this  prescription  can  be  followed  well  enough  to  get  reasonable  recognition  while 
keeping  the  state  space  of  manageable  size.  However,  because  the  FORMANT  task  took  I73.X 
times  real  time  and  because  the  size  of  the  DRAGON  network  grows  with  the  size  of  the  vocabu- 
lary. there  is  a significant  area  for  future  research.  Techniques  need  to  be  developed  which  can 
more  efficiently  represent  more  complex  tasks. 

The  use  of  a general  abstraet  model  has  greatly  facilitated  the  development  of  the  DRAGON 
system  and  has  important  implications.  Lowerrc  <|L3j)  has  been  able  to  analyze  the  main 
recognition  program  to  produce  an  optimized  program  which  produces  identical  results  but  is  much 
faster  than  the  original  program  Work  is  being  done  to  adapt  the  DRAGON  system  to  run  on  a 
minicomputer.  Newell  ((N3J)  has  suggested  that  the  simplicity  of  the  DRAGON  system  would 
allow  it  to  be  u*:d  as  a "benchmark"  system.  Any  more  sophisticated  system  must  justify  its 
greater  complexity  by  recognizing  speech  either  in  less  lime  or  more  accurately  than  the  DP  '-^ON 
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system. 

A major  motivation  for  constructing  .he  DRAGON  system  has  been  to  demonstrate  that 
speech  recognition  based  on  complete  optimal  search  is  practical.  Clearly,  however,  a complete 
search  is  not  the  most  efficient  procedure.  The  most  important  area  for  future  research  is  to 
develop  techniques  such  that  the  complete  Markov  search  is  an  upper  bound  on  the  amount  of 
computation,  but  such  that  much  less  compulation  time  is  used  exploring  parallel  paths  when  the 
correct  path  is  clear. 
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oeioo 

"RA" 

- AA 

00200 

"AE" 

- HE 

89300 

"AH" 

- AH 

88488 

"A0" 

- HO 

00500 

"AU* 

- HR  UH 

80608 

■AY" 

- HH  IH 

88708 

"B" 

- B IY 

80800 

■CM" 

- SH 

80900 

"0* 

- 0 IY 

81000 

"EH" 

- EH 

81100 

"ER" 

- ER  ER 

01208 

"EY* 

- EH  IH 

81300 

"F" 

- EH  F 

81400 

"FILLER" 

- 

81S60 

"G* 

- G IY 

81600 

"HH" 

- EH  IH  - SH 

81700 

"I" 

- AA  IH 

01800 

"IH" 

- IH 

01900 

MY" 

- IY 

82000 

"JH* 

- SH 

02100 

"1C* 

- K EH  IH 

82200 

“L* 

- EH  L 

02300 

"IT 

- EH  n 

02400 

"N" 

- EH  N 

82500 

"NULL" 

_ 

82600 

"NX" 

- IH  NX 

02700 

“OU" 

- OU 

02800 

"OY" 

- AO  IH 

02980 

"P" 

- P IY 

03000 

"R" 

- HA  ER 

03180 

"S" 

- EH  S 

83200 

"SH" 

- SH 

03300 

"T" 

- T IY 

03400 

"UH" 

- UH 

03500 

■uu- 

- UU 

036OO 

"V" 

- V IY 

03700 

"HH" 

- UH 

03800 

"Y" 

- U HP  IH 

83900 

"2" 

- S IY 

04000 

"ZH" 

- SH 

04108 

•s 

- S 

84200 

A 

- AX 

64300 

ABOUT 

- RX  - B HH  - T 

04400 

ABOVE 

- AX  - B HH  V 

04500 

ABSOLUTE 

- RE  - B S AX  L UU  - T 

04600 

ABSOLUTE 

- RE  - B S OU  L UU  - T 

04  700 

ACOUSTIC 

- AX  - XUU  S - T IH  - K 

04800 

AOC 

- EH  IH  - 0 IY  S IY 

84900 

ADO 

- AE  - 0 

05000 

OOVANCED 

-AE-DVREN-S-T 

05100 

AFRAID 

- AX  F ER  EH  IH  - 0 

05200 

AIRPLANE 

- EH  ER  - P L Af  IH  N 

053OO 

AIRPLANES 

- EH  ER  - P L Eh  IH  N - S 

054  00 

ALL 

- AO  L 

05500 

ALPHA 

- AH  L F AX 

05600 

AN 

- AE  N 

05700 

AN 

- AX  N 

05800 

ANALYSIS 

- AX  N AE  L IH  S IH  S 

05900 

ANALYZE 

- AE  N L AH  IH  S 

06000 

AND 

- AX  N - 0 

06100 

ANESTHETIZED 

- AX  N EH  *,  - T AX  S RX  - 

06200 

ANOTHER 

- AH  N HH  F ER 

86300 

ARE 

- AH  ER  AX 

86*00 

AS 

- AE  S 

86508 

ASPIRATED 

- AE  S - P IH  ER  EH  IH  - T 
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06608  ASPIRATION 
06700  RSTHflR 
06800  RT 
06900  RTAL 
07000  RTTACHEO 
07100  AUTOCORRELATION 
07200  RUFUl 
07300  ROBY 
07400  BACK 
07500  BACXEO 
07600  BRO 
07700  BRIRSTOU 
07800  BRIER 
07900  BOLL 
08000  BRLLEO 
08100  BRLIS 
08200  BRN0UI0TH 
08300  BRRRED 
08400  BEC0T1ES 
08500  BEEN 
08600  BEGINNING 
08700  BENT 
08800  BE TR 
08900  BIRO 
09000  BISHOP 
09100  BISHOP’S 
09200  BLflCHIELl 
09300  BLEEOING 
09400  BOTTLE 
09500  BOUNORRY 
09600  BOY 
09700  BURST 
09800  BY 
09900  CRLCULRTE 
1000Q  CAPTURES 
10100  CRSTLE 
10200  CRSTLES 
10300  CRSTRRTEO 
10400  COT 
10500  CATEGORY 
10600  CEILING 
10700  CENTER 
10800  CENTISECONOS 
10900  CENTRAL I2E0 
11000  CEPSTRRL 
11100  CEPS  TROLLY 
11200  CEPSTRUH 
11300  CHANGE 
11400  CHECt 
11500  CHEST 
11600  CHICI  EN-POX 
11700  CHINA 
11800  CHURCH 
11900  CIGARETTES 
12000  CIRCUHCISEO 
12100  CLOUDY 
12200  CLUSTERING 
12300  COEFFICIENTS 
12400  COTWA 
12500  COHPARE 
12600  COHPIIE 
12700  COMPUTE 
12800  CONSIDER 
12900  CONSTRUCTION 
13000  CONTINUOUS 


RE 

RE 

RE 

RH 

RR 

RO 

RO 


S - P IH  ER  RR  IH  SH  AX  N 
S n AX 

- T 

- T RR  L 

- T RE  - SH  - T 


X RO  ER  EH  L EH  IH  SH  RX  N 


T OU 
RH  L 

- 8 EH  IH  - B IV 

- B RE  - X 

- B RE  - X - 0 

- B RE  - 0 

- B RE  ER  S - T OU 

- B EH  IH  - X ER 

- B RR  L 

- B RR  L - 0 

- B RR  L S 

-baen-ouih-of 

- B RR  ER  - 0 
-BRX-XAHT1S 

- B RX  N 

- 8 IY  - G IH  N IH  NX 

- B EH  N - T 

- B EH  IH  - T RH 

- B ER  - 0 

- B IH  SH  RX  - p 

- B IH  SH  RX  - P S 

- B L RE  - 1 U EH  l 

- B L 1Y  - 0 IH  NX 

- B RR  - T L 

- B RE  RR  N - 0 ER  IY 

- B RO  IH 

- B ER  S - T 

- B RR  IH 

RE  L - X Y UU  L EH  IH  - T 
RE  - P - SH  ER  S 
S L 
S L S 

S - T ER  EH  IH  - T RX  - 0 

- T 

T RX  - G RO  ER  IY 
IH  NX 
- T ER 

-TIHSEH-XRXN-OS 
T ER  L RR  IH  S - 0 
P S - T ER  L 

T ER  L IY 
T ER  RH  H 


X 
X 
X 
X 
X 
X 
X 

s 
s 
s 
s 

X 
X 
X 

SH  EH 


RE 
RE 
RE 
AE 
RE 
IY 
EH 
EH 
EH 
EH 

EH  - P S - 
EH  - P S - 
N - G 


- SH  EH  - X 

- SH  EH  S - T 

- SH  IH  - X RX  N - P RR  _ x S 

- SH  RA  IH  N RX 

- SH  ER  - SH 

- S IH  - C ER  EH  - T S 

- S RX  ER  - X RH  H S RX  - S - 0 

- X L OR  UU  - 0 IY 

- X L RH  S - T ER  IH  NX 

- X OU  EH  F IH  SH  IH  N - T S 

- X RH  H RX 

- X RH  T1  - P RE  ER 

- X RH  N - P RR  |H  L 
-XRHU-PYUU-T 

- X RH  N - S IH  - 0 ER 

- X RX  N - S - T ER  RH  - X SH  RX  N 

- X RX  N - T IH  X Y I)"  RX  S 
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13100 

COVARIANCE 

- 

X OH 

V AE  ER  IY  AE  N - S 

13200 

cromps 

- 

X £R 

AE  n - P S 

13300 

CRfnn 

- 

X ER 

IY  M 

13400 

CREF 

- 

X ER 

EH  F 

13500 

CUkbUR 

- 

X ER 

S ER 

13600 

CUTOFF 

_ 

1 AH 

- T AD  F 

13700 

CYCLES 

- 

S AA 

IH  - X L S 

13800 

OB 

- 

0 IY 

- B IY 

13900 

DEAD 

- 

D EH 

- D 

14000 

DEBUG 

- 

D IY 

- B AA  - G 

14100 

DEBUGGING 

- 

D IY 

- B AX  - G IH  NX 

14200 

DECIBELS 

- 

D EH 

S IH  - B EH  L S 

14300 

decimal 

. 

0 EH 

S fl  L 

14400 

DELETE 

- 

0 AX 

L IY  - T 

14500 

DEL  TO 

- 

D EH 

L - ' AH 

14600 

DENIAL IZEO 

- 

0 EH 

N - T L AA  IH  S - D 

14700 

DEPRESSED 

_ 

0 IY 

- P ER  EH  S - D 

14800 

DERIVATION 

- 

0 AE 

ER  IH  V EH  IH  SH  AX  N 

14900 

OESIGNING 

- 

D AX 

S AA  IH  N IH  NX 

15000 

DESIRE 

- 

0 III 

S AA  IH  ER 

15100 

DETAIL 

_ 

D IY 

- T EH  IH  L 

15200 

DIO 

- 

D IH 

- 0 

15300 

DIFFERENT 

_ 

D IH 

F ER  N - T 

15400 

DIGITAL 

_ 

0 IH 

- G IH  - T L 

15500 

DISPLAY 

• 

D AX 

S - P L EH  IH  ' 

15600 

DIVIDE 

_ 

0 IH 

V AA  IH  - 0 

15700 

DIVIDES 

- 

D IH 

V AA  IH  - D S 

15600 

15300 

DIZZINESS 

DO 

- 

D IH 
D Ull 

S IY  N AX  S 

16000 

DOG 

• 

0 AD 

- G 

16100 

DOING 

. 

D UU 

IH  NX 

16200 

DOMAIN 

_ 

D DU 

M EH  IH  N 

16300 

DONE 

- 

D AH 

N 

16*.00 

DDUBLE-U 

- 

D AH 

- B L Y UU 

16500 

001  IN 

_ 

D AA 

UH  N 

16600 

DR  INI 

- 

D ER 

IH  NX  - X 

16700 

DYNAMIC 

_ 

D AA 

IH  N AE  M IH  - X 

16800 

EACH 

. 

IY  - 

T SH 

16900 

EASY 

_ 

IY  S 

IY 

17000 

EDITING 

- 

EH  - 

D IH  - T IH  NX 

17100 

E IGHT 

- 

EH  IH  - T 

172e0 

E I GH  TEEN 

- 

EH  IH  - T ]Y  N 

17  300 

EIGHTY 

_ 

EH  IH  - T IY 

l/»00 

El  EVA  TED 

- 

EH  L 

EH  V EH  IH  - T EH  - D 

17500 

ELEVEN 

- 

IY  l 

EH  V AX  N 

17600 

IN  PASSENT 

- 

AA  N 

- P PA  S AA  N 

17700 

ENtl 

- 

EH  N 

- D 

17800 

enhancement 

- 

AX  N 

HH  AE  N S - M AX  N - T 

17900 

EPSILON 

- 

EH  - 

P S IH  L AA  N 

16000 

ESTIMATION 

- 

EH  S 

- T IH  n EH  IH  SH  AX  N 

161C0 

EVER 

- 

Oil  V 

ER 

18200 

EXECUTE 

- 

EH  - 

X S AX  - X AA  UH  - T 

18300 

EXTRA 

- 

EH  - 

X S - T ER  AX 

16400 

FACT 

- 

F AE 

- X - T 

18500 

FACTOR 

- 

F AA 

- X - T AO  ER 

18600 

FONT 

- 

F AA 

N - T 

18700 

FAST 

- 

F AE 

S - T 

18800 

FATHER 

- 

F AA 

OH  ER 

18900 

FATHOM 

- 

F AE 

F AX  M 

19000 

FEATHER 

_ 

F EH 

OH  ER 

19100 

FEATURE 

- 

F IY 

- T SH  ER 

192O0 

F E VE  R 

- 

F IY 

V CR 

19300 

FEVERISH 

- 

F IY 

V ER  IH  SH 

19400 

FFT 

- 

EH  F 

EH  F - T iY 

19500 

FIFTEEN 

• 

F IH 

F - T IY  N 
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19G00 

F If  TY 

- 

F IH  F - T IY 

19700 

FILE 

- 

F AA  IH  L 

19800 

riLTER 

- 

F IH  L - T ER 

19900 

FILTEREO 

- 

F IH  L - T ER  - 0 

20000 

F I NHL 

- 

F AA  IH  N L 

20100 

riNo 

- 

F AA  IH  N - 0 

20200 

FINOING 

- 

F AA  IH  N - 0 IH  NX 

20300 

FIRST 

- 

F ER  S - T 

20400 

FIVE 

- 

F AA  AX  V 

20500 

FLOP 

- 

F L AE  - P 

20600 

FLOOR 

- 

F L AO  ER 

20700 

FOOL 

- 

F UU  L 

20800 

TOR 

- 

F AO  ER 

20900 

FORMANT 

- 

F AO  ER  M AE  N - T 

21000 

FOUR 

- 

F AO  U ER 

21100 

FOURIER 

- 

F AO  ER  IY  EH  IH 

21200 

FOURTEEN 

- 

F AO  ER  - T IY  N 

21300 

FOURTY 

- 

F AO  ER  - T IY 

21400 

FRANCE 

- 

F ER  AE  N - S 

21500 

FREQUENCY 

. 

F ER  IY  - K U EH  N - S 

21600 

FREQUENTLY 

- 

F ER  IY  - K U AX  N - T 1 

21700 

FRICTIONAL 

- 

F ER  IH  - K SH  AX  N L 

21800 

FRONTEO 

- 

F ER  AH  N - T £H  - 0 

21900 

FUNCTION 

- 

F AH  N - K SH  AX  N 

22000 

GAMMA 

- 

G AE  M AH 

22100 

GET 

- 

G EH  - T 

22200 

GETS 

_ 

G EH  - T S 

22300 

GIVE 

- 

G IH  V 

22400 

GLOTTAL 

- 

G L AA  - T L 

22500 

GO 

. 

G Oil 

22600 

GOES 

. 

G Oil  S 

22700 

G0E3-T0 

- 

G OH  S - T AX 

22800 

GOING 

- 

G OH  IH  NX 

72901.’ 

GONORRMEA 

- 

G AA  N ER  IY  AX 

23000 

GRAMMAR 

- 

G ER  AL  M ER 

23100 

GRAMMATICAL 

- 

G ER  AX  M AE  - T IH  - K 

23200 

GRAPHICS 

- 

G ER  AE  F IH  - K S 

23300 

GRASS 

- 

G ER  AE  S 

23400 

HAO 

- 

HH  AE  - 0 

235C0 

HAMMING 

- 

HH  AE  M IH  NX 

23600 

HANNING 

- 

HH  AE  N IH  NX 

23700 

HAVE 

- 

HH  AE  V 

23800 

HEAO 

- 

HH  EH  - 0 

23900 

HEHOACHES 

- 

HH  EH  - 0 IH  AX  - K S 

24000 

HEAOLINES 

- 

HH  EH  - 0 L AA  IH  N - S 

24100 

HELLO 

- 

HH  EH  L OU 

24200 

HERE 

- 

HH  IH  ER 

24300 

HERTZ 

- 

HH  ER  - T S 

24400 

HIGH 

- 

HH  AA  IH 

24500 

HIJACKING 

- 

HH  AA  IH  - SH  AE  - K IH 

24600 

HILBERT 

- 

HH  IH  L - B ER  - T 

24700 

HOSPITALIZEO 

- 

HH  AA  S - P AX  L AX  S - 

24800 

HOU 

- 

HH  AA  U 

24900 

HUNDRED 

- 

HH  AH  N - 0 ER  EH  - 0 

25000 

HYPOTHESIS 

- 

HH  AA  IH  - P AA  F IH  S 

25100 

I 

- 

AA  IH 

25200 

ICE 

- 

AA  IH  S 

25300 

ILL 

- 

IH  L 

25400 

IMAGE 

- 

IH  M IH  - SH 

25500 

IMAGINARY 

- 

IH  M AE  - G IH  N AE  ER 

25G00 

IMHUNI2E0 

- 

IH  M Y UU  H AX  S - 0 

25700 

IN 

- 

III  1, 

25800 

INCREMENT 

- 

IH  N - K ER  AX  M EH  N - 

25900 

INITIAL 

- 

IH  N IH  SH  L 

26000 

INJUREO 

- 

IH  N - SH  ER  - 0 
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26100 

INSERT 

- 1H  N - S ER  - T 

2G200 

INSTANCE 

- IIIN-S-TAEN-S 

2o300 

INTERACTIVE 

- IH  N - T ER  AE  - K - T 1H  V 

26*00 

INTO 

- IH  N - T UU 

26500 

INVERSE 

- Ill  N V ER  S 

26600 

IS 

- AX  S 

26700 

ISRAEL 

- IH  S ER  1Y  L 

26600 

IT 

- IH  - T 

26900 

ITAI  URA 

- IH  - T PH  - K ER  AH 

27000 

JAMES 

- SH  EH  IH  H S 

27100 

JUDGE 

- SH  AH  - 0 - SH 

27200 

KING 

- K IH  NX 

27300 

KING'S 

- K IH  NX  S 

27*00 

KNIGHT 

- N AA  lit  - T 

27500 

KNIGHT’S 

- N AA  IH  - T S 

27600 

LABEL 

- L EH  IH  - 8 L 

27700 

LABELING 

- L EH  IH  - B L IH  NX 

27600 

LAOELS 

- L EH  IH  - B L S 

27900 

LARYNGEAL 12E0 

- L AA  ER  IH  N - G L AA  IH  S - 

0 

28000 

LEARN 

- L ER  N 

28100 

LETT 

- L EH  r - T 

26200 

LENGTH 

- 1.  AX  NX  - r 

28300 

LESION 

- L IY  S AX  N 

28400 

LESIONS 

- L IY  S AX  N - S 

28500 

LET 

- L £H  - T 

28600 

LILY 

- L III  L IY 

28700 

L INEAR 

- L IH  N IY  ER 

28600 

LION 

- 1 AA  IH  UH  N 

28900 

LIP 

- EH  L AH  IH  - P IY 

2'JOOO 

LIST 

- 1 IH  S - T 

23100 

LITERAL 

- L IH  - T ER  L 

29200 

LOAO 

- L Oil  - 0 

29300 

LOCALIZED 

- L Oil  - K L AA  IH  S - 0 

29400 

LOG 

- L AO  - G 

29500 

LOGARITHM 

- L AO  - G AE  ER  IH  F M 

29600 

LONG 

- L no  NX 

79709 

LOOK 

- L UH  - r 

29800 

LOil 

- L Oil 

7 1900 

LOWERED 

- 1 OU  ER  - 0 

300CO 

LPC 

- EH  L - P IY  S IY 

30100 

MARK  EL 

- M AA  ER  - K L 

30;  oo 

MARK  INC 

- M AA  ER  - K IH  NX 

20300 

MATE 

- M EH  IH  - T 

’’0400 

MAX 

- M AC  - K S 

30500 

MAY 

- M EH  IH 

30600 

ME 

- M IY 

SCOO 

nrnsi  ES 

- « IY  S L 5 

30800 

MEASURE 

- M Ell  SH  ER 

30900 

METHOD 

- M Ell  F AH  - 0 

31000 

METHODS 

- M EH  F AH  - 0 S 

31100 

I1ICR03EC0N0S 

- fl  AA  III  - K ER  OU  S EH  - l AX 

N 

31200 

MILO 

- M AA  IH  L - 0 

31300 

MILLION 

- II  III  L III  AX  N 

31400 

MILL ISECONOS 

- M III  L IH  S EH  - K AX  N - 0 S 

31500 

MIN 

- M III  N 

31600 

MINUS 

- M nn  III  N All  s 

31700 

MOO 

- m mi  - o 

31800 

f CDIFIER 

- M nn  - o IH  V AA  IH  ER 

31900 

MOM 

- m nn  n 

32000 

MOVE 

- M Ull  V 

32100 

MOVES 

- M UH  V s 

32200 

MOVFS-TO 

- n Ull  V S - T AX 

32300 

MUCH 

- n no  - sh 

32400 

MUMPS 

- M AX  M - P s 

32500 

MUROER 

- M ER  - 0 ER 
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3260f 

NASALIZEO 

- N EH  IH  S L AA  IH  S - 0 

327(0 

NAUSEA 

- N AO  AH  SH  AX 

3280b 

NECAT 

- N AX  - C EH  IH  - T 

32900 

NETUORK 

-neh-tuer-k 

33000 

NEU 

- N Ull 

33100 

NEUTON 

- N UU  - T AX  N 

33200 

NINE 

- N AA  IH  N 

33300 

NINETEEN 

- N AA  IH  N - T IY  N 

33400 

NINETY 

- N AA  IH  N - T IY 

33S00 

NIXON 

- N IH  - K S AX  N 

33600 

N0600Y 

- N Oil  - B AH  - 0 IY 

33700 

NON-SPEECH 

- N AA  N - 3 - P IY  - SH 

33800 

NOU 

- N AA  UU 

33900 

NUMBER 

- N AH  M - B ER 

34000 

numbness 

- N AH  AX  M N AX  S 

34100 

NUTS 

- N AX  - T S 

34200 

OBOE 

- OH  - B OU 

34300 

OCTAL 

- AA  - K - T l 

34400 

OCTAVE 

- AA  - K - T EH  V 

34500 

OF 

- AO  V 

34600 

OF 

- AX  V 

34700 

OFTEN 

- AO  AH  F AX  N 

34  800 

ON 

- AO  N 

34900 

ONE 

- U AH  N 

35000 

OPERATION 

- AH  - P ER  AE  IY  SH  AX  N 

35100 

OR 

- AO  ER 

35200 

OROER 

- AO  ER  - 0 ER 

35300 

OVEREAT 

- OU  V ER  IY  - T 

35400 

PAIN 

- P AX  IH  N 

35500 

PAINS 

- P AX  IH  N S 

35600 

PALATAL I2E0 

- P AE  L AE  - T L AA  IH  S - 0 

35700 

PARAMETER 

- P AX  ER  AE  M EH  - T ER 

35800 

PARAMETERS 

- P ER  AE  M AX  - T ER  S 

35900 

PART 

- P AA  ER  - T 

36000 

PASS 

- P AE  S 

36100 

pa  hn 

- P AO  N 

3G200 

PEAL 

- P IY  - K 

36300 

PEALS 

- P IY  - K S 

36400 

PER 

- P ER 

36500 

PERIOD 

- P IH  ER  IY  AX  - 0 

36600 

PHONE 

- F OU  N 

36700 

PHONEME 

- F OU  N IY  M 

36800 

PHONEMIC 

- F AX  N IY  M IH  - K 

36900 

PHONETIC 

- F AX  N EH  - T IH  - K 

37000 

PHRASE 

- F ER  EH  IH  S 

37100 

PICI.ING 

- P IH  - K IH  NX 

37200 

PITCH 

- P IH  - T SH 

37300 

PLOT 

- P L AA  - T 

37400 

PLUS 

- P L AH  S 

37500 

POINTS 

- P AO  IH  N - T S 

37600 

POP 

- P AA  - P 

37700 

POSITION 

- P AX  S IH  SH  AX  N 

37800 

POSITIONS 

- P AX  S IH  SH  AX  N - S 

37900 

POST-EMPHASIS 

- P OU  S - T EH  M F AH  S IH  S 

38000 

POT 

- P AA  - T 

38100 

POMER 

- P AA  U ER 

38200 

PRE-EMPHASIS 

- P ER  IY  EH  M F AH  S IH  S 

38300 

PREDICTION 

- P ER  IY  - 0 III  - K SH  AX  N 

38400 

PREDICTIVE 

- P ER  AX  - 0 IH  - K - T IH  V 

38500 

PRESENT 

- P ER  EH  S EH  N - T 

38600 

PRIMARY 

- P ER  AA  IH  M EH  ER  IY 

38700 

PRONY 

- P ER  OU  N IY 

38800 

PROTOCOL 

- P ER  OU  - T OU  - K AO  L 

38900 

PUP 

- P AH  - P 

39000 

PUT 

- P UH  - T 

I 


39100 

0 

39200 

QUEEN 

39300 

OUEEN'S 

39400 

RflBINER 

39S00 

ROISEO 

39600 

RAPE 

39700 

RATING 

39800 

REAL 

39900 

rectancular 

40000 

REOUCEO 

40100 

RELEASE!) 

40200 

REOUEST 

40300 

resolution 

40400 

RE  TRAC  IE  0 

40500 

RETROflEXEO 

40600 

RIGHT 

40700 

ROAR 

40800 

ROBINSON 

40900 

ROOX 

41000 

ROOT'S 

41100 

ROOT 

41200 

ROOTS 

41300 

ROSES 

41400 

ROUNOEO 

41500 

RUSSIA 

41600 

SAY 

41700 

SCALE 

41806 

SCHAR! ER 

41900 

SCM1I6 

42000 

SECOS'O 

42100 

SECONDARY 

42200 

SECTION 

42303 

SEE 

42100 

SEGMENT 

42500 

SLCUE 

4.600 

SENTENCE 

42700 

SERIOUS 

42800 

SEVEN 

42900 

SEVEN 

43000 

seventeen 

43100 

SEVENTY 

43200 

SEVERE 

43300 

SEX 

43400 

SHARP 

42500 

SHORT 

43600 

SMOUl  0 

43700 

SHOD 

43800 

SICX 

43900 

SIOE 

44000 

SI.ENCE 

44103 

SIMULATION 

44200 

SING 

44300 

SICTER 

O4400 

SIT 

44500 

SIX 

44600 

SIXTEEN 

41700 

SIXTY 

44800 

SLASH 

44500 

SMOTE 

45000 

SMUOTMf 0 

45.10 

SMOOTHING 

452C0 

SPLAi  CP 

45300 

SPECIFICATION 

45400 

SPECIAL 

45500 

SPECTROGRAM 
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- ER  UH  - X S 

- ER  Ull  - T 

- ER  UU  - T S 

- ER  OU  S IH  S 

- ER  00  OH  N - 0 EH  - 0 

- ER  OX  SH  OX 

- S EH  IH 

- S - X EH  IH  L 

- SH  EH  IH  F ER 

- SH  U 00 

- S EH  - r OH  N - 0 

- S EH  - X OH  N - 0 EH  ER  IV 

- S EH  - r.  SH  OX  N 

- S IV 

- S EH  - C n OX  N - T 

- S EH  - c U EH  IH 

- S EH  N - T TM  N - S 

- S IH  ER  IV  OX  S 

- S EM  V OX  N 

- S EH  V EH  N 

- S EH  V EH  N - T IV  N 

- S EH  V EH  N - T IV 

- S OX  V IH  ER 

- S EH  - X S 

- SH  OH  ER  - P 

- SH  00  ER  - T 

- SH  UH  - 0 

- SH  OU 

- S IH  - X 

- S 00  IH  - 0 

- s no  IH  L eh  n - s 

- 3 IH  n V UU  L eh  IH  SH  OX  N 

- S IH  NX 

- S IH  S - T ER 

- S IH  - T 

- S IH  - X S 

- S IH  - X S - T IV  N 

-SIH-XS-TIY 

- S L OE  SH 

- s n ou  - x 

- s n UH  x - o 

- s n uu  r ih  nx 

- S - H IV  - X ER 

- S - P EH  S IH  F IH  - X EH  IH  SH  OX  N 

- S - P EH  - X - T ER  L 
-S-PEH-X-TERDU-CEROEH 


> 
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46600 

si’icniun 

- S P EH  - X - T ER  AX  fl 

45700 

smcH 

- S - P 1Y  - T SH 

45800 

ST0RT 

- S - T AA  ER  - T 

45000 

START  INC 

- S - T BA  ER  - T 1H  NX 

4G000 

STATE 

- S - T EH  1H  - T 

46100 

S1EA0Y 

- S - T EH  - 0 IY 

46700 

STEPS 

- S - T EH  - P S 

4G300 

STOP 

- S - T OA  - P 

46400 

STORE 

- S - T BO  ER 

46500 

STORIES 

- S - T AO  ER  IY  S 

4C600 

S’RESS 

- S - T £R  EH  S 

46700 

SUB-PHONETIC 

- S AH  - B E AX  N EH  - T 1H  - X 

46SC0 

SUB  SEQUENT 

- S BIT  - B S EH  - G 11  EH  N - T 

46000 

SUDDEN 

- S BH  - 0 BX  N 

47000 

sunrnRY 

- S BX  H ER  IY 

47100 

SUOCERY 

- S ER  - SH  £R  IY 

4 7300 

SYLLABIC 

- S 1H  L BE  B 1H  - X 

47300 

SYMBOL 

- S 111  n - B 00  L 

47400 

SYNTHESIS 

- S 111  N f BX  S 1H  s 

4 7600 

Tfil  f 

- T EM  1H  - X 

47600 

TPI  ES 

- T EH  IH  - X S 

47700 

1 Ajl 

- 1 Af  S - > 

4 7000 

TELL 

- T EH  L 

47000 

TIN 

- T EH  N 

48000 

TERTIARY 

- T ER  Sll  IY  EH  ER  IY 

48100 

TESTING 

- ; Eli  s - T IH  NX 

43700 

THAT 

- OH  BE  - T 

48300 

Till 

- OH  AX 

45400 

THETA 

- F EH  111  - T AX 

48500 

THIN 

- F III  N 

48600 

TIURO 

- F [P  - 0 

4 3700 

thirteen 

- r (R  - T IY  N 

48c  00 

TM IE  Y 

- F ER  - T IY 

48^00 

TIIOPN 

- F 00  ER  N 

40000 

THOUjANU 

1 Oil  S AE  N - 0 

49100 

THREE 

I ER  IY 

40200 

TIME 

T BA  IH  fl 

49300 

TIDES 

T HA  III  fl  S 

494  00 

title 

- T All  IH  - T L 

49500 

TO 

- T AX 

49,  00 

TRACI ING 

- T ER  nt  - 1 IH  NX 

41700 

t Pnri  s 

- T ER  BE  - *;  s 

49500 

TRAIN 

- T IR  EH  IH  N 

49100 

TRnNSL'P  I PT  ION 

- T ER  BE  N - S - X ER  IH  - P SH 

5OO00 

TO  PH  ,f  OHM 

-TfPnEN-SFBOERH 

50 1 00 

TRANSITION 

- T ER  BE  N - S IH  SH  BX  N 

60200 

triangular 

- T IR  f.A  IH  EH  IH  N - G Y UU  L 1 

50300 

TRUlEO 

- T IR  IH  L - 0 

50400 

TUBE RCUl OS  IS 

- T UU  - B ER  - X Y UU  L OU  S BX 

50600 

Tiff  t VE 

- T ||  EH  L V 

r Hi  no 

TlllNTY 

- T U EH  N - T IY 

50700 

T.  U 

- 1 UH 

508on 

TIJO 

- T 11  Ull 

50300 

UN-3  TRF3SE0 

- BH  N - 5 - T EP  EH  S - 0 

510O0 

UN°Ol'NOEO 

- OH  N ER  OB  UH  N - 0 EH  - 0 

51100 

UNTIL 

- P»  N - T IH  L 

51200 

UHiNf 

- Y IR  BX  N 

51300 

US 

- BH  S 

5)400 

USE 

- Y Ull  S 

5 1300 

USING 

- V UU  S IH  NX 

51600 

UTTE  RANCE 

AH  - ER  EH  N - S 

1 1 ’00 

V*  IUE 

- V PC  l Y Jrf 

51800 

VIAL 

- V IY  L 

51900 

VELAR12E0 

- V IV  l AB  ER  AO  IH  S - 0 

57000 

V IE  1 NAfl 

- V IH  EH  - T N BE  fl 

I 
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62100 

VOICED 

- v no  ih  s - o 

52200 

VOICELESS 

- V no  1H  S L EH  S 

52300 

U 

- o nn  - b L nn  uh 

52400 

uncoN 

- u nt  - c nx  h 

52500 

WONT 

- w nn  n . r 

52600 

UOR 

- u no  ER 

52?00 

URIERCOTE 

- u no  - T ER  _ G OE  IH  - T 

52600 

UnVEEORM 

- U EH  |H  V F PO  ER  fl 

52900 

HE 

- U IV 

53000 

IIEICH 

- u no  nx 

53100 

HERE 

- II  ER 

53200 

mtnr 

- w mi  - t 

53300 

UHLN 

- w nx  n 

63400 

WHERE 

- W HE  ER 

53500 

WHICH 

- WH  IH  - SH 

53600 

UIKOOU 

- U IH  N - o OU 

53’00 

WITH 

- W IH  E 

53600 

IIORO 

- W FR  - 0 

53600 

X 

- EM  - r S 

54000 

Y 

- w nn  ih 

54100 

>ELLOU 

•um  ow 

54  200 

YES 

- Y EH  S 

c4300 

YOU 

- y nx 

54400 

YOUR 

- Y IR 

5.500 

7 

- S IY 

'•iCOJ 

ZERO 

- S IH  ER  OU 

54  ’00 

ZOO 

- S UU 

54  000 

I 

. 

54900 

1 

. 

eeiee 

68266 

66368 

66468 

66S86 

66668 

66766 

68886 

66986 

8X686 

61166 

61266 

61368 

81486 

61588 

8)088 

81788 

61888 

61988 

62686 

82168 

82286 

62386 

82488 

82588 

82688 

82768 

82888 

82988 

83688 

63188 

83288 

63388 

63488 

83458 

83568 

83688 

83788 

83868 

83988 

84868 

64188 

84288 

64388 

64466 

84568 

84688 

64748 

84888 

84988 

65868 

6S168 

65266 

85388 

85408 

85508 

85608 

65780 

85988 

66680 
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I SUB-GRAHhAR  FOR  FORHANT  TRACKING  SUB-TASK. 


<f I <orn-aant>t :■ 

<< lraqua«t>i ia 

<*  !dasire-sant>iia 
<f  ItMIOIIa 


<(  !paraa-sant>i  :• 
<f  !cohMnd>i  la 


<f  I Introi  la 
<f  1 1 tar-l  t—>i  ia 


<)  'par—  p4->i  ia 
<<  'par— -spaot  ia 


I <f  !raquaat>  1 


<1  Idea  I re-tan t> 

<1 !paraa-aant> 

I WANT  TO  DU  <1 ! task> 

F0RHANT  TRACKING 

TINE  DOMAIN  ANALYSIS 

PITCH  HARKING 

PHONETIC  BOUNDARY  NhkKING 

PHONETIC  LABELING 

PHONETIC  TRANSCRIPTION 

ACOUSTIC  FEATURE  LABELING 

GRANNATICAL  CATEGORY  DERIVATION 

GRANNAR  SPECIFICATION 

NETWORK  EDITING 

PARANETER  TESTINC 

DEBUGGING 

SIHULATION 

HYPOTHESIS  RATING 

FACTOR  ANALYSIS 

CLUSTERING 

DISPLAY  CONSTRUCTION 

SPEECH  SYNTHESIS 

DIGITAL  FILTER  OESIGNING 


<f  !co— and> 

<1 ! inlroxi  !co— and> 


USE  <f  !paraa-phr> 

<f  Icoaputexf ! f unc-phr> 

<f  'coapu1ex)Munc-phr>  USING  <!•— th-type>  NETHOO 
<f  Iplotxl  lplot-1  tea> 

<)  Icoaparexf  la  I tar-l  i*»> 

INC  RE  RENT  THE  <f I lncre-spec>  <»!  Incra-prap>  <*  Inine-digl  t>  POINTS 
I WANT  TO 

FOR  EACH  <f ! I tar-l laa> 

PHRASE 

PHONE 

PHONE HE 

SEGHENT 

HI  NOON 

FUNCTION 

TIHE 

POSITION 

SENTENCE 

UTTERANCE 


<<  'paraa-apec> 

<*  'paraa-phrxf  Iprepxl  !paraa-apac> 

FILE  NUHBER  << Inlna-diglt* 

UTTERANCE  NUMBER  <) Inlna-dlgi t> 

A <f luind-typa>  UINOOU  OF  <1 Inine-digi «>  POINTS 
A <) ! iraq-spao  OF  <t!nina-diglt>  HERTZ 
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06206 
06300 
06406 
06500 
06600 
06706 
06806 
06900 
67006 
37 100 

07200  <flpr«p>it. 

07306 

07406 

07506 

07700 

67806  <1 !wlnd-typ«>i 

07966 
68006 
08166 
08266 
08306 

88406  <f  Hr«q-tp«e>i  :• 
08566 
08668 
08766 

68800  <fl(riq-typ9>iii 

88966 

09608 

09180 

69280 

09386  <1  is 

89580 

89606 

09780  <flnMM>tla 
09806 
09908 
10068 
10188 
16206 
10360 
10406 
19568 
10600 

10760  <f  !Mth-tlnd>iiB 
10806 
10906 
11000 
11166 

11206  <1  !r«»-typ«>:  t« 

11306 

11400 

11568  <l!rM-Wil>lla 
11606 
11760 
11808 
11906 
12006 
12186 

12200  <f 1 •«ph-typ«>: i. 

12306 


fl  <f lr«i-typ«>  RESOLUTION  OF  <tlnin*-4lf  It  Ml  Ire-unl  t> 
<f !nin*-4lglt>  COEFFICIENTS 
BN  ORDER  OF  <f !nlna-4if lt> 

STBRT  TINE  <*!nu«> 

ENO  TINE  <f Ininq-dij it> 

fl  <<lMph-typ«>  OF  <flnlrw-4lVHxlt*>  PER  OCTAVE 
R SCALE  FBCTOR  OF  «t> 

B FLOOR  OF  <llnua> 
fl  CEILIN6  OF  <1 lnlna-0 If l»> 

OF 

TO 

HITH 

ON 

HAHHING 

HANNING 

BLACKWELL 

RECTANGULAR 

TRIANGULAR 

FREQUENCY 

'Ir«q-typ«>  FREQUENCY 
BANDWIDTH 

CENTER 
CUTOFF 
LOU  PASS 
HIGH  PASS 


>S 

THE  <1  (Mth-4  lnd> 

ITAKURA 

HARKEL 

PRONY 

ATAL 

ROBINSON 

SCHAFFER  ANO  RABINER 

FANT 

NEWTON 

BAIRSTOU 

AUTOCORRELATION 
COVARIANCE 
PEAK  PICKING 
ROOT  FINOING 

TINE 

FREQUENCY 

HERTZ 

CYCLES  PER  SECOND 
HICROSECONOS 
NILLISECONOS 
CENT I SECONDS 
POINTS 

PRE-ENPHASIS 

POST-ENPNASIS 
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12408 

12500 

<Mdb>i  i. 

DECIBELS 

12600 

DB 

12700 

12800 

<1 !coaputa>i (■ 

COflPUTE 

12900 

CALCULATE 

13006 

FIND 

13100 

GET 

13200 

TAKE 

13300 

CONSIDER 

13400 

13800 

<f  llunc-phrxia 

THE  <1 !co*p-lune> 

13900 

THE  RUT0C0RRELRTI0N  FUNCTION 

14000 

THE  C0VRRIRNCE  FUNCTION 

14100 

THE  FFT 

14200 

THE  FRST  FOURIER  TRRNSFORfl 

14300 

THE  FOURIER  TRRNSFORH 

14400 

THE  HILBERT  TRRNSFORH 

14600 

THF  LINERR  PREDICTION  COEFFICIENTS 

14700 

THE  LINEAR  PREDICTION  FILTER 

14800 

THE  INVERSE  FILTER 

14900 

THE  SPECTRUfl 

15000 

THE  CEP5TRUH 

15100 

THE  <f !cp«c-adj>  SPECTRUfl 

15150 

THE  ROOTS 

15200 

15300 

<1  'comp-funoi  i« 

<f ! tunc-part> 

15400 

<1 ! func-part>  OF  <»!<unc-phr> 

15500 

15600 

<1 1 func-par t>i la 

ROOTS 

15700 

PERKS 

15800 

IflRGlNRRY  PART 

15900 

RERL  PART 

16000 

LOGRRITHfl 

16100 

RESOLUTE  VALUE 

16200 

16300 

<f !p 1 o t >: )■ 

PLOT 

16400 

OISPLHY 

16500 

SHOU 

16600 

16700 

<»  !plo«-Maa>i  ia 

THE  SPECTROGRflfl 

16710 

THE  SPECTROGRRfl  <f  Iprcpxf  !parap-phr 

16800 

THE  WRVEFORfl 

16900 

THE  FORflflNT  TRACKS 

17000 

THE  FUNCTION 

17100 

<1 1 func-phr> 

17200 

17300 

<1  !sp«c-adj>i  la 

SflOOTHEO 

17400 

<f 1 c»th-Mth>  SflOOTHEO 

17500 

<l'tp*CHMth> 

17600 

17700 

<» l>p*c-a«lh>l la 

CEPSTRRL 

17880 

L INERT  PREOICTIVE 

17900 

INVERST  FILTERED 

18000 

FFT 

18100 

FRST  FOURIER  TRRNSFORfl 

18280 

FOURIER 

18300 

18400 

<f  loath-Mlh>l  la 

CEPSTRRLLY 

18S00 

LINERR  PREDICTION 
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ib6*8 

INVERSE  FILTER 

1)700 

LPC 

10080 

10900 

<f  lcoapara>;  ia 

C0HWRE 

19000 

LOOK  AT 

19100 

CONSIDER 

19200 

19300 

<t!altar-llat>iia 

<<taath-typa>  ICTH00  <1  laath-conjxl  laath-typa>  HETNOO 

19400 

ANOTHER  HETNOO  OF  <f ! fora-tn*> 

19500 

<f!fora-taak>  HETH00S 

19600 

<♦  1 fora-*  «ak>  UITH  DIFFERENT  PARAHETERS 

19700 

19000 

<1  laath-conJ>i  la 

UNO 

19900 

UITH 

20000 

20180 

<f  !fora-taak>ita 

fo.>hbnt  estihbtion 

28200 

SPECTRAL  SrtOOTHING 

20308 

I It  AGE  ENHANCEHENT 

20400 

ROOT  FINOING 

20580 

LINEAR  PREDICTION 

20606 

28708 

<f ! incra-prap>: ia 

BY 

28888 

IN  STEPS  OF 

28908 

21800 

<f ! incra-apaoi  ia 

UIN00U 

21188 

STARTING  TIHE 

21200 

1 ^ ^ 

21388 

1 This  la  tha  nuafcar  sub-graaaar. 

21408 

1 it  ia  usad  by  aost 

of  tha  task  aub-graaaari. 

21500 

21680 

<t !nua>i ia 

<»  In Ina-dlgl t> 

21708 

ZERO 

21800 

21900 

<< Inina-dlgl t>i  ia 

<4  la  lx— dlgl t> 

22080 

<t ! thraa-digi t>  HILL  ION  <f lalx-dlgl t> 

22180 

22208 

<1  lalx-dlgl »>i  ia 

<f 1 thraa-dlgl t> 

22300 

<1! thraa-digl t>  THOUSAND  <<l thraa-dlgl t> 

22406 

22508 

<1 !thraa-digit>iia 

<f 'tuo-dlg!t> 

226ee 

<1 Idigi t>  HONORED  <1 ltao-diglt> 

22708 

<1 Idigl t>  HONORED 

22800 

22900 

<f 1 two-dlgf  t»:  la 

<tldiglt> 

23000 

<1 1 tsan> 

23180 

<1 ! tansxf  Idigi  t > 

23200 

<f 1 tana> 

23300 

23400 

<1 ! tans>i  ia 

TWENTY 

23580 

THIRTY 

23680 

FOURTY 

23700 

FIFTY 

23868 

SIXTY 

23900 

SEVENTY 

24000 

EIGHTY 

24108 

NINETY 

24200 

24390 

<1 1 taan>: la 

TEN 

r 

24400 

ELEVEN 

24588 

TWELVE 

L 
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24666 

THIRTEEN 

24780 

FOURTEEN 

24866 

FIFTEEN 

24986 

SIXTEEN 

25868 

SEVENTEEN 

25166 

EIGHTEEN 

25266 

NINETEEN 

25386 

25486  <l!digit>ua 

ONE 

25566 

TUO 

25666 

THREE 

25706 

FOUR 

25866 

FIVE 

25908 

SIX 

26666 

SEVEN 

26108 

EIGHT 

26260 

NINE 
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08100 

80208 

j SYNTAX  FOR  AP  VOICE 

NEUS  QUERY  SYSTEM.  20  TERM INAL  SYMBOLS  (UOROS) 

00300 

00400 

<QUERY>n. 

I <REQU£ST>  I 

00000 

<REQUEST>n- 

LET  <PR0N0UNA>  HAVE  <C0LL-SUH> 

00600 

GIVE  <PR0N0UNBxN0UN -PHRASE > 

08700 

GIVE  <PRONOUNBxCOLL-SUH> 

00600 

TELL  <PR0N0UNCxC0LL-SUH> 

00900 

TELL  <PRONOUNCxQUANTIF IERxNOUN-PHRASE> 

01000 

01100 

TELL  <PR0N0UNCxTELL-QUANxSUM-PHRA3£> 

01206 

<C0LL-SUt1>ii. 

<SUH-PHRASE> 

01300 

ALL  <SUfl -PHRASE  > 

01400 

01000 

SEX 

01600 

<SUH-PHRASE»i  ■■ 

THE  <SUnnARIESB> 

01700 

01800 

THE  <SUIWIARIESA>  AND  <SUHflARIESB> 

01900 

<sun(i0RiESfl>u. 

STORIES 

02000 

HEADLINES 

02100 

02208 

summary 

02210 

<sunnfiRiESB>ii. 

STORIES 

02220 

HEADLINES 

02230 

82248 

SUMMARY 

62300 

<TELL-QUAN>i ia 

<QUANTIFI£R> 

02100 

ABOUT  ALL 

02008 

02600 

ALL 

02780 

<PRONOUNA>u« 

HE 

82800 

02980 

US 

82910 

<PR0N0UNB>i i« 

HF 

02920 

02938 

US 

02940 

<PR0N0UNC>n. 

HE 

02950 

02966 

US 

03090 

<QUANTIFIER>n. 

ALL  ABOUT 

03108 

83200 

ABOUT 

8.  300 

<NO..'N-PHRASC>i  la 

<NOUNA>  ANO  <NOUN&> 

>'3400 

<NOUNA>  OR  <NOUNB> 

0 }50© 

'00 

<NOUNB> 

03700 

<N0UNA>: 1. 

FRANCE 

03880 

AIRPLANE  HIJACKING 

03900 

HIJACKING 

04000 

CHINA 

04108 

ISRAEL 

84200 

HUROER 

04300 

NIXON 

64400 

RAPE 

04S80 

RUSSIA 

04600 

SEX 

04700 

AIRPLANES 

04000 

VIETNAM 

94900 

mu 

06000 

THE  VIETNAM  H.4R 
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as  let 

85288 

#5388 

85488 

85588 

85688 

85788 

85688 

85988 

86888 

86188 

86288 

86388 

86488 

86588 

86688 

86788 

86888 

86988 


WATERGATE 
THE  WATERGATE 

<NOUMB>u.  FRANCE 

AIRPLANE  HIJACKING 

HIJACKING 

CHINA 

ISRAEL 

HUROER 

NIXON 

RAPE 

RUSSIA 

SEX 

AIRPLANES 

VIETNAM 

WAR 

THE  VIETNAM  WAR 
WATERGATE 
THE  WATERGATE 
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mu 

GRMNMR  FO0  CHESS 

•1211 

•030* 

00400 

<Mdva>l la 

( <aovab>  1 

00S00 

<POvab>l  1 a 

<aova«xchack-word> 

00600 

00700 

<aevaa> 

00800 

<IOV(»l  |a 

<pca-locxaetlanxloca> 

00900 

<pca-  lac><taka«xpea-loea> 

01000 

01100 

<caa tl a-aova> 

01200 

<pca-loc>i  :m 

<placa> 

01300 

01400 

<placa>  ON  <lec> 

01S00 

01608 

<loc»i la 

<placabxaquara> 

01700 

<pc«-l«c0>lla 

<piacae> 

01800 

01900 

<piacac>  ON  <loca> 

02000 

<loca>ti% 

<p  1 acadxcquar  •*> 

82100 

* 

02280 

<pliea>i ta 

<reyal> 

02300 

<roy*lx»an> 

02400 

02S00 

<MII> 

02600 

<HH>I  la 

<bnr> 

6^708 

<bnr>  PRUN 

028C0 

02900 

PRUN 

03000 

<piacab>i  ia 

<roy»lb> 

03100 

<reyalbxaanb> 

03280 

03380 

<Mnb> 

03400 

<Mnt»:  ia 

<bnrb> 

03500 

<bnrb»  PRUN 

03630 

03700 

PRUN 

03800 

<placac>i ia 

<rey«lc> 

03998 

<reyalcxMnc> 

04030 

04100 

<Mnc> 

042CO 

(Mnoi  la 

<bnre> 

04100 

<bnrc>  PRUN 

04409 

045Oe 

PRUN 

04600 

<piacad>i ta 

<royald> 

04709 

<roy«ldxaand> 

04000 

0.900 

<Mnd> 

P500P 

<-aAnd>i  ia 

<bnrd> 

65.09 

<bnrd>  PRUN 

05209 

05300 

PRUN 

95409 

<reiial>i  :a 

K1NC 

O'  500 
0:  600 

QUEEN 

05  ’00 

<bnr »i ia 

BISHOP 

05890 

KNIGHT 

05900 
0C  000 

ROOK 
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06100 

<roy«ld>tis 

KING 

06206 

063ft 

QUEEN 

06480 

<brvd>i  !■ 

BISHOP 

06508 

KNIGHT 

06606 

86700 

ROOK 

06600 

<royalb>i is 

KING 

86908 

87080 

QUEEN 

07100 

<bnrb»  is 

BISHOP 

07200 

KNIGHT 

07308 

07400 

ROOK 

07500 

<roy*lc>i !• 

KING 

07688 

07790 

QUEEN 

07800 

<bnrc>! is 

BISHOP 

07900 

KNIGHT 

08000 

68180 

ROOK 

08280 

<squar«>i is 

ONE 

08306 

TUO 

68460 

THREE 

08500 

FOUR 

08680 

FIVE 

08700 

SIX 

88800 

SEVEN 

88900 

09008 

EIGHT 

09100 

<squ«rs«>i  is 

ONE 

09280 

THO 

09306 

THREE 

89466 

89588 

09608 

09786 

09808 

69900 

18600 

16100 

10206 

10300 

10480 

10560 

10600 

10780 

10800 

10906 

11000 

11190 

11200 

11308 

11400 

11500 


<aot lon>i  is 

<tak  m>ub 
<c«a»  is-aovs>i  i< 

<roy«l«>i  is 
<ch«ck-word>t  is 


FOUR 

FIVE 

SIX 

SSVEN 

EIGHT 

TO 

HOVES-TO 

GOES-TO 

TAKES 

CAPTURES 

■ CASTLE 

CASTLE  ON  <royi l«>  SIDE 
CASTLE  <royal«>  SIDE 

KING 

QUEEN 

CHECK 

HATE 
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00108 

00200 

00100 

00400 

eosee 

80600 

00700 

00600 

00900 

oieee 

01100 

01200 

01300 

01400 

eisoo 
01600 
01700 
01800 
01900 
02000 
02100 
02200 
02300 
02400 
02600 
02690 
02700 
‘*2800 
02900 
03300 
63100 
0.9200 
03300 
03400 
03600 
03600 
03700 
03S00 
03900 
04000 
04100 
04200 
04300 
04400 
O‘.S00 
6x600 
04  7CC 
048C0 
04900 
05000 
05100 
05206 
05100 
05400 
05500 
05600 
05700 
05800 
85900 
06000 


BNF  FOR  THE  OOCTOR  INTERVIEW.  7«  TE RHINAL  MONOS. 

<HEPO>n«  C <SENTENCE>  1 

<SENTENCE > 1 1 « <INTEROGB>  <HP6IT-VER|> 

<1NTER0CC>  <SVf1PT0f1> 

<1NTER0C0>  <SYHPTOH>  <POJ> 

<INTEROCE>  <SYHPTOHS>  *POJ> 

<1NTER0GC>  <PHYS-CONO> 

<1NTER0CC>  <PERSONPL-STPTE> 

<1NTER0CH>  <VERBP>  <PILHLNT> 

<1NTER0GH>  <VERBB>  <PPRTICIPIPL> 

<U>  <INTEROCr>  <PPRTICIPIPL> 

<1NTER0G0>  <PERSONPL-NOUN>  <PERSONAL-AOJ> 

<U>i  i • Uh'ERE 
WHEN 

<QUPNT1F1ER>i  i • OFTEN 
LONG 

FREQUENTLY 

HUCH 

<1NTER0GP>:  i«  HOU 

HOU  <QUPNTIFIER> 

<1NTER0GB>i f * 00  YOU 

<1NTER0GP>  00  YOU 

<INTEROCC>i  >.  WHERE  IS  THE 

<1NTER0G0>> : * is  THE 
IS  YOUR 

<1NTER0GE>!  is  ORE  THE 
ORE  YOUR 

< IN TEROCF>:  i.  WERE  YOU 
WERE  YOU  EVER 

• 1NTER0GG>: : « ORE  YOU 
<1NTER0GF> 

<INTEROGH»i«  HRVE  YOU 

<1NTFR0GP>  HOVE  YOU 

<VERBP>u.  HPO 
EVER  HPO 

<VERBB>i:»  BEEN 
EVER  BEEN 

<HPB1T-VERB>: !«  SHORE 
OR  I NR 
OVEREPT 

SHORE  <SHOREY-0OJ> 

<SHOREY-PuJ>u,  CIGPRETTES 
POT 
GRPSS 
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06 '88 
W 208 
06300 
06400 
06508 
06600 
06700 
06800 
06980 
07080 
07100 
07280 
07300 
07400 
07509 
07600 
07700 
07800 
07908 
08000 
08100 
08200 
0830b 
08400 
08500 
08600 
08700 
08800 
08900 
09000 
09100 
09200 
09300 
09400 
09500 
09600 
09700 
09800 
09900 
10000 
10100 
10200 
10300 
10400 
10500 
10600 
10700 
10800 
10900 
11000 
11100 
11200 
11300 
11400 
11500 


<SYWPT0R>i i»  PAIN 
NUtlBNESS 
NAUSEA 
DIZZINESS 
BLEEDING 


<SYnPT0«S>u.  HEAOACHES 
PAINS 
CRAHPS 
CHEST  PAINS 
LESIONS 


<AILI1ENT>it.  nutIPS 
I1EASLES 
CHICKEN-POX 
TUBERCULOSIS 
RSTHflR 
GONORRHEA 
CLOUDY  URINE 
SURGERY 
AN  OPERATION 

<PDJ>:i.  SEVERE 
HILO 
BAO 

CONTINUOUS 

SHARP 

SERIOUS 

<PHYS-COND>: i»  SICK 
ILL 

IN  PAIN 

FEVERISH 

DEAD 


<PERSONAL-STATE»i AFRAID  OF  SURGERY 
CASTRATED 

(PERSONAL -NOUN>i :•  URINE 
HERO 


<PERSONAL-AOJ>i ■■  CLOUOY 
ATTACHED 

<PARTIC!P1AL»: :•  HOSPITALIZED 
CIRCUnc.'SEO 
ANESTHETIZED 
CASTRATED 
AFRAIO  OF  SURGERY 
innuNizEO 
INJURED 
SERIOUS 
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! 


| 


80180 

eo2eo 

<aantanca>i ■■ 

I <raquest>  ) 

80300 

00408 

00S80 

<raquaal>i ■ • 

COMPUTE  <iunc~phr> 
USE  <paraa-phr> 

00600 

00700 

00800 

<func-phr»i :• 

< ( unc » i on> 

<lunctlon>  USING  <paraa  phr> 

00980 

01000 

<funct ion>i :• 

THE  <nana>  TRANSFORM 

0U00 

01200 

81300 

<naaa>i  !• 

HILBERT 

FOURIER 

01400 

01S08 

01600 

<param-phr>i i> 

<paraa-spac> 

<paraa-apac>  UITH  <paraa-pnr> 

81700 

01800 

<paran-spac>i ■■ 

R LENGTH  OF  FIVE  HUNORED  TWELVE  POINTS 
R HRHH1NG  WINDOW 

\ 
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181 

182 

44 

<aantanca>t ■■ 

1 

-1 

8 

C 2 

181 

1 

1 

1888 

<raquast> 

3 

-2 

1 

2 

1888 

) 4 

182 

1 

11 

1888 

ENOOF <*ant«nc«>  S 

-1 

1 

4 

1888 

<raquasl>tta 

6 

-2 

1 

2 

1888 

COMPUTE  7 

291 

1 

6 

1888 

<func-phr» 

8 

-3 

1 

7 

1888 

USE  9 

222 

1 

6 

1888 

<par«*-phr> 

18 

-6 

1 

9 

1888 

EN00F<raquact> 

11 

-2 

2 

17 

S88 

32 

588 

< * unc-phr»i  i ■ 

12 

-3 

1 

7 

1888 

< June  1 lor» 

13 

-4 

1 

12 

1888 

dune  t lon> 

14 

-4 

1 

12 

1888 

USING  IS 

2S2 

1 

22 

1888 

*paran-phr> 

16 

-6 

1 

IS 

1888 

EN00Fdunc-phr> 

17 

-3 

2 

22 

S88 

32 

S88 

* lunc  1 1 on>:  i • 

18 

-4 

2 

12 

S88 

12 

S08 

THE  19 

1S6 

1 

18 

1888 

<namt>  28 

-S 

1 

19 

1888 

TRnN'FOPn 

21 

388 

1 

26 

1888 

ENDOF  « (unct ion> 

22 

-4 

1 

21 

1888 

<nAfn*  >:  : s 

23 

-S 

1 

19 

1888 

HILBERT  24 

381 

1 

23 

1888 

FOURIER  25 

299 

1 

23 

1888 

ENOOF  <nan«> 

26 

-S 

2 

24 

S88 

25 

588 

<param-phr>i  i> 

27 

-6 

3 

9 

333 
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IS 

333 

38 

334 

<pir|» 

*p*c  > 

28 

-7 

27 

1888 

<p«r«»- 

«P»C> 

29 

-7 

27 

1888 

UITH 

31 

251 

1 

44 

1888 

<p*r—  phr  > 

31 

-6 

31 

1888 

ENOOf  <p«fM  phr> 

32 

44 

588 

32 

588 

<p«r«»- 

•PIOI  1 

> 33 

-7 

27 

588 

27 

588 

0 

34 

1 

1 

33 

1888 

LENGTH 

35 

565 

1 

34 

1888 

Of 

36 

117 

I 

35 

lt’88 

FIVE 

37 

58 

1 

36 

1888 

T 

1 

O 

38 

338 

1 

37 

1888 

TWELVE 

39 

349 

1 

38 

1888 

POINTS 

41 

22S 

1 

39 

1888 

fi 

41 

1 

1 

33 

1888 

Hfinnipc 

4: 

253 

1 

41 

1888 

UINOCU 

43 

232 

1 

42 

1888 

ENOOF  <p«rM-ip«c> 

44 

48 

588 

43 

588 

2 
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2 

4 

135 


1 

- 

8 

8 "NULL" 

8 

988 

8 

2 

- 

8 

181  ( 1 

8 

988 

1 

188 

3 

- 

8 

8 "NULL" 

1 

988 

8 

2 

1888 

4 

- 

8 

182  ] 1 

8 

988 

23 

188 

5 

- 

8 

8 "NULL" 

1 

988 

8 

4 

1888 

6 

- 

8 

8 "NULL" 

1 

988 

8 

2 

1888 

7 

- 

8 

291  COMPUTE 

1 

8 

988 

6 

188 

8 

1C 

5 

291  COMPUTE 

1 

8 

988 

7 

188 

9 

RH 

24  291  COMPUTE 

1 

8 

988 

8 

188 

18 

M 

13 

291  COMPUTE 

£ 

8 

988 

9 

188 

11 

- 

8 

291  COMPUTE 

1 

8 

988 

18 

188 

12 

P 

1 

291  COMPUTE 

1 

8 

988 

11 

188 

13 

Y 

18 

29  i COMPUTE 

1 

8 

988 

:? 

188 

14 

UU 

19 

1 291  COMPUTE 

1 

8 

988 

13 

188 

15 

- 

8 

291  COMPUTE 

1 

8 

988 

14 

188 

16 

T 

3 

291  COMPUTE 

1 

8 

988 

15 

188 

17 

- 

8 

8 "NULL" 

1 

988 

8 

16 

1888 

18 

- 

fa 

222  USE 

1 

8 988 

6 

188 

19 

Y 

18 

222  USE 

1 

8 988 

18 

188 

28 

UU 

19 

222  USE 

1 

8 988 

19 

188 

21 

S 

18 

222  USE 

1 

8 988 

28 

188 

22 

- 

8 

8 "NULL" 

1 

988 

8 

21 

1888 

23 

- 

8 

8 "NULL" 

2 

988 

8 

34 

588 

78 

see 

24 

- 

8 

8 "NULL” 

1 

988 

8 

16 

1888 

25 

- 

8 

8 "NULL" 

1 

988 

8 

24 

1888 

26 

- 

8 

8 "NULL" 

1 

988 

8 

24 

1888 

27 

- 

8 

252  USING 

1 

8 988 

51 

188 

28 

Y 

16 

252  USING 

1 

8 988 

27 

188 

29 

UU 

19 

252  USING 

1 

8 988 

-1--- 
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28 

188 

38 

S 

18 

252  USING 

1 

8 

999 

29 

188 

31 

1H 

28 

2S2  USING 

1 

8 

989 

38 

188 

32 

NX 

IS 

252  USING 

1 

8 

999 

31 

188 

33 

- 

8 

8 "NULL" 

1 

988 

8 

32 

1888 

34 

- 

8 

8 "NULL" 

2 

988 

9 

51 

588 

78 

588 

35 

- 

8 

8 "NULL" 

2 

988 

8 

24 

588 

24 

588 

36 

- 

A 

156  THE  1 

9 

998 

35 

188 

37 

OH 

9 

156  THE 

1 

8 

999 

36 

188 

38 

nx 

38 

156  THE 

1 

8 

989 

37 

188 

39 

- 

8 

8 "NULL" 

1 

988 

9 

38 

1888 

48 

- 

8 

388  TRRNSFORH 

1 

9 

989 

69 

188 

41 

T 

3 

388  TRRNSFORn 

1 

9 

999 

48 

188 

42 

ER 

25 

388  TRRNSFORn 

1 

9 

998 

41 

188 

43 

BE 

26 

388  TRn*<SF0RH 

1 

• 

999 

42 

188 

44 

N 

14 

388  TRRNSFORH 

1 

9 

988 

43 

188 

45 

- 

8 

388  TRRNSFORn 

1 

8 

988 

44 

188 

46 

S 

18 

388  TRRNSFORn 

1 

9 

989 

45 

188 

47 

F 

7 

398  TRRNSFORn 

1 

9 

988 

46 

188 

48 

no 

22 

388  TRRNSFORn 

1 

9 

989 

47 

188 

49 

ER 

25 

388  TRRNSFORn 

1 

8 

989 

48 

188 

58 

n 

13 

388  TRRNSFORn 

1 

9 

989 

49 

188 

51 

- 

8 

8 "NULL" 

1 

988 

8 

58 

1888 

52 

- 

8 

8 "NULL" 

1 

999 

8 

38 

1888 

53 

- 

8 

381  HILBERT 

1 

9 

999 

52 

188 

54 

HH 

12 

381  HILBERT 

1 

9 

998 

53 

188 

55 

IH 

28 

381  HILBERT 

1 

i 

1 

998 

54 

188 

56 

L 

17 

381  HILBERT 

1 

8 

998 

55 

188 

57 

- 

8 

381  HILBERT 

1 

9 

989 

56 

188 

58 

B 

2 

381  HILBERT 

1 

9 

998 
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5’  iee 

59  ER  25  301  HILBERT  1 

58  100 

60  - 0 301  HILBERT  1 

59  100 

61  T 3 301  HILBERT  1 

60  100 

62  " 0 299  FOURIER  1 

52  100 

63  F 7 299  FOURIER  1 

52  100 

6*  no  22  299  FOURIER  1 

63  100 

65  ER  25  299  FOURIER  1 

64  100 

66  IY  29  299  FOURIER  1 

65  100 

67  EH  27  299  FOURIER  1 

66  100 

68  IH  28  299  FOURIER  1 


0 

0 

0 

0 


900 

900 

900 

900 

900 

900 

900 

900 

900 

900 


67 

100 

69  - 

0 

61 

0 "NULL" 
500 

2 

900 

0 

68 

500 

70  - 

0 

21 

O "NULL" 
333 

3 

900 

0 

32 

333 

76 

334 

71  - 

0 

70 

0 "NULL" 

1000 

1 

900 

0 

72  - 

0 

70 

0 "NULL" 

1008 

1 

900 

0 

73  - 

0 

135 

251  UlfH 
100 

1 

0 

900 

74  U 

16 

73 

251  WITH 
100 

1 

0 

900 

75  IH 

28 

74 

251  UITH 

100 

1 

0 

900 

76  F 

7 

75 

251  UITH 
100 

1 

0 

900 

77  - 

0 

76 

0 "NULL" 

1000 

1 

900 

0 

78  - 

8 

135 

0 "NULL" 
500 

2 

900 

0 

78 

500 

79  - 

0 

70 

0 "NULL" 
500 

2 

900 

0 

70 

500 

CO  - 

0 

79 

i n i 

100 

0 

900 

81  RX 

30 

80 

i n i 

100 

0 

900 

82  - 

0 

81 

565  LENGTH 
100 

1 

0 

900 

83  L 

17 

82 

565  LENGTH 
100 

1 

0 

900 

84  RX 

30 

83 

565  LENGTH 

100 

1 

0 

900 

85  NX 

15 

84 

565  LENGTH 

100 

1 

0 

900 
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88 

•* 

65 

1 

565  LENGTH 
188 

1 

• III 

87 

F 

86 

7 

565  LENGTH 

188 

1 

• III 

88 

~ 

87 

I 

117  OF 
188 

1 • 

lie 

89 

AD 

88 

22 

117  DF 

188 

1 1 

1 911 

98 

V 

89 

8 

117  OF 
188 

1 f 

911 

91 

- 

98 

8 

58  FIVE 

188 

1 

1 911 

92 

F 

91 

7 

58  FIVE 

188 

1 

1 911 

93 

flfl 

92 

23 

58  FIVE 

188 

1 

1 911 

94 

AX 

93 

38 

58  FIVE 

188 

1 

1 911 

95 

V 

94 

8 

58  FIVE 

188 

1 

8 911 

96 

- 

95 

8 

336  HUNDRED 
188 

1 

• 918 

97 

HH 

96 

12 

338  HUNDREO 

188 

1 

8 988 

98 

AH 

97 

24 

338  HUNDRED 

188 

1 

8 988 

99 

N 

96 

14 

336  HUNDREO 
188 

1 

8 988 

188 

~ 

99 

8 

338  HUNDRED 
188 

1 

8 888 

181 

D 

188 

4 

336  HUNORED 
188 

1 

8 988 

182 

ER 

181 

25 

336  HUNDRED 

188 

1 

8 988 

183 

EH 

182 

27 

336  HUNDREO 

188 

1 

8 988 

184 

- 

183 

8 

336  HUNDREO 
188 

1 

8 988 

185 

D 

184 

4 

336  HUNDREO 
188 

1 

8 988 

186 

— 

185 

8 

349  TWELVE 
188 

1 

8 988 

187 

T 

186 

3 

349  TWELVE 
188 

1 

8 988 

186 

U 

16 

187 

349  TWELVE 
188 

1 

8 988 

189 

EH 

188 

27 

349  TWELVE 

188 

1 

8 688 

118 

L 

17 

189 

349  TWELVE 
188 

1 

8 988 

111 

V 

118 

8 

349  TWELVE 
188 

1 

8 988 

112 

“ 

111 

8 

225  POINTS 
188 

1 

8 988 

113 

P 

112 

1 

225  POINTS 
188 

1 

8 988 

114 

AO 

113 

22 

225  POINTS 

188 

1 

8 988 

115 

IH 

114 

225  POINTS 

188 

1 

8 988 

Jl-  - a* i~'-* 
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j 


116 

N 

14 

115 

225  POINTS 
188 

1 

8 

988 

117 

“ 

e 

116 

225  POINTS 
188 

1 

8 

988 

J 18 

T 

3 

117 

225  POINTS 
188 

1 

8 

988 

J 19 

S 

ie 

118 

225  POINTS 
188 

1 

8 

988 

120 

- 

8 

79 

1 fi  1 

108 

8 

see 

121 

fix 

38 

128 

1 A 1 

180 

8 

988 

122 

— 

8 

121 

253  HRIMING 
188 

1 

8 

988 

123 

HH 

12 

122 

253  HfiftfllNG 

188 

1 

8 

988 

1 c 4 

RE 

26 

123 

253  HfiflfllNG 

188 

1 

8 

988 

125 

n 

13 

124 

253  HflfiniNG 

188 

1 

8 

988 

126 

IH 

28 

125 

253  HfifiniNG 

188 

1 

8 

988 

127 

NX 

15 

126 

253  HfifiniNG 

188 

1 

8 

388 

128 

“ 

8 

127 

232  U1N00U 
188 

1 

8 

988 

129 

u 

16 

128 

232  U1N00U 
188 

1 

8 

988 

130 

IH 

28 

129 

232  UINQOU 

188 

1 

8 

988 

131 

N 

14 

138 

232  UINOOU 
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flP  Newa  Retrieval  Task  i 

Lai  me  hava  alt  tha  atorlaa. 
Lat  me  hava  alt  tha  atorlaa. 

Giva  ma  Franca. 

Glva  ma  Franca. 

Tall  ma  all  about  Nixon. 

Tall  ma  all  about  Nixon. 

Tall  ma  about  Uateryate. 
Tall  ma  about  Uataryata. 

Tall  ua  all  about  China. 
Tall  ua  all  about  China. 

Giva  ua  Ruaaia. 

Giva  ua  Ruaaia. 

Tall  ma  all  about  Iaraal. 
Toll  ma  all  about  Iaraal. 

Let  ma  have  tha  haadllnaa. 
Let  ma  have  the  haadllnaa. 

Giva  ma  tha  aummary. 

Giva  mo  tha  aummary. 
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Interact Ive  foraant  tracking  taaki 

I want  to  do  foraant  tracking. 

I want  to  do  foraant  tracking. 

Uta  a Hamming. window  wl.th  five  hundred, .twelve. points*. 
Uca  a Hanning  window  to  flva  hundred,  four  polnta. 

Incraaant  tha  window  In  atapc  of  ona  hundred  polnta. 
Increment  the  window  In  steps  of  one  hundred  points. 

For  each  window,  compute  the  fast  Fourier  transfora. 
For  each  window,  compute  the  fast  Fourier  transform. 

Display  tha  Fourier  spectrum. 

Olsplay  the  Fourier  spectrum. 

Display  the  LPC  smoothed  spectrum. 

Display  tha  LPC  smoothed  spectrum. 

Display  the  cepstrally  smoothed  spectrum. 

Display  the  cepstrally  smoothed  spectrum. 

Use  a pre-emphasis  of  six  db  per  octave. 

Use  a pre-emphasis  of  sixty  db  per  octave. 


— 


nm  uniMiai 


Do  you  smoke? 

Do  you  smoke? 

Do  you  drink? 

Do  you  drink? 

Do  you  have  numbness? 

Is  your  numbness? 

Uhere  is  the  pain? 

Uhere  is  the  pain? 

Have  you  had  mumps? 

Is  your  numbness? 

tire  your  headachus  severe? 

Are  your  headaches  severe? 

Are  you  in  pain? 

Ore  you  in  pain? 

Uhere  Mere  you  hospitalized? 

Uhere  Mere  you  hospitalized? 

Uhen  Mere  you  immunized? 

Uhen  Mere  you  immunized? 

Have  you  bean  circumcised? 

Have  you  been  circumcised? 

Is  the  pain  severe? 

Is  the  pain  severe? 

Have  you  ever  been  anesthetized? 

Have  you  ever  been  anesthetized? 

Have  you  ever  been  injured? 

Have  you  ever  been  injured? 

Have  you  ever  had  an  operation? 

Have  you  ever  had  an  operation? 

Hoh  often  do  you  have  nausea? 
Hom  often  have  you  had  an  operation? 

Hou  long  have  you  had  asthma? 

Hom  long  have  you  had  asthma? 
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I 


1 

f 


I 

5 


I«  your  dizziness  continuous? 
Is  your  dizz Inass  continuous? 

flra  you  afraid  of  surgary? 
flra  you  afraid  of  surgary? 

How  much  do  you  weigh? 

How  much  do  you  smoke? 

Is  your  urina  cloudy? 

Is  your  urina  cloudy? 

Were  you  avar  hospitalized? 
Uere  you  aver  hospitalized? 
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Voice  chase  task : 

Pawn  goac  to  king  tour. 

Pawn  goac  to  king  four. 

Knight  novas  to  king  bishop  thraa. 

Knight  novas  to  king  bishop  thraa. 

Bishop  goas  to  bishop  tour. 

Bishop  goas  to  bishop  four. 

Knight  on  king  bishop  thraa  goas  to  knight  tlva. 

Knight  on  king  bishop  thraa  goas  to  king  tlva. 

Pawn  capturas  pawn. 

Pawn  captures  pawn. 

Knight  on  king  knight  five  capturas  pawn  on  king  bishop .seven. 

Knight  on  king  knight  five  captures  pawn  on  king  bishop  seven. 

Queen  goes  to  bishop  thraa. 

Queer  goes  to  bishop  thrae. 

Knight  goes  to  bishop  three. 

Knight  pawn  goas  to  bishop  thraa. 

knight  capturas  knight  on  quaan  five. 

Knight  capturas  knight  on  pawn  lour. 

King  to  quaen  one. 

King  to  queen  one. 

Knight  takes  pawn. 

Knight  takas  pawn. 

Knight  capturas  rook  on  quean  rook  alght. 

Knight  captures  rook  on  queen  rook  two. 

Queen  goes  to  queen  five. 

Queen  goas  to  queen  five. 

Pawn  on  queen  two  goes  to  quean  four. 

Pawn  on  quean  two  goas  to  quean  four. 

Bishop  no vos  to  knight  five,  check. 

Bishop  novas  to  knight  five,  check. 


Bishop  goes  to  knight  five,  check. 
Bishop  goes  to  knight  five,  chock. 
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Quesn  on  queen  llvi  captures  quean,  check. 
Queen  on  quaen  one  captures  quaan,  check. 


Queen  moves  to  quean  five,  check. 
King  moves  to  quaan  five,  check, 


Queon  takes  bishop  on  quean  six. 
Queen  takes  bishop  on  quaan  six. 


Rook  mcves  to  king  one. 
Rook  moves  to  k ing  one. 


Rook  moves  to  king  seven,  check. 
Pawn  moves  to  king  seven,  chack. 


Queen  moves  to  queen  bishop  seven. 
Queen  moves  to  queen  bishop  savan. 
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Interactive  formant  tracking  taeki 

I want  to  do  forMnt  tracking. 

I want  to  do  formant  tracking. 

Uaa  a Hamilng  ulndou  ot  tlva  hundred  tuelve  points. 

Uaa  a Hawwing  window  ot  tlva  hundred  polnta. 

Use  utterance  number  alx  ot  tile  number  tlva. 

Use  utterance  number  six  ot  tile  number  tlva. 

Increment  the  ulndou  In  etepe  ot  one  hundred  points. 

Increment  the  ulndou  In  etepe  ot  tour  points. 

For  each  ulndou,  display  the  Fourier  spectrum. 

For  each  window,  display  the  torment  tracks. 

Compute  the  LPC  smoothed  spectrum  using  the  autocorrelation  method. 

Compute  the  LPC  smoothed  spectrum  using  the  autocorrelation  method. 

Compute  the  roots  ot  the  inverse  (liter  using  Bairetow’e  method. 

Compute  the  roots  ot  the  Inverse  (liter  using  Balretou’e  method. 

Display  the  Imaglriry  part  ot  the  roots. 

Display  the  Imaginary  part  of  the  roots. 

I want  to  compare  the  autocorrelation  method  with  the  covariance  method. 

I uant  to  compare  the  autocorrelation  method  and  the  covariance  method. 

Increment  the  ulndou  by  one  hundred  points. 

Increment  the  ulndou  by  one  points. 

Display  the  FFT  spectrum. 

Display  the  FFT  spectrum. 

Use  a Hanning  ulndou  ot  tuo  hundred,  flfty-elx  points. 

Use  a Hanning  ulndou  ot  tuo  hundred,  six  hertz. 

Display  the  FFT  spectrum. 

Display  the  FFT  spectrum. 

Compute  the  Hilbert  transform. 

Use  tuo  points. 

I uant  to  look  at  Image  enhancement  ulth  different  parameters. 

I uant  to  compare  Image  enhancement  ulth  different  parameters. 

Display  tt«r  spectrogram  stth-a-pra-emphasls  of  six  decibels  par  octave. 
Display  the  spectrogram  to  a pre-emphasis  of  six  thousand  five  hertz. 
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